Re: Gossip intermittently marks node as DOWN
I think we've found the issue! It seems that the times on those Cassandra servers was being kept in sync by vmware tools using the time of the vmware host machine. We have now turned that off and are using the ntp service to keep the times in sync like we do for our physical servers and we have not seen the gossip failures for the last 24 hours. -- Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593569.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
RE: Gossip intermittently marks node as DOWN
Set phi_convict_threshold to 12 is a good idea if your network is busy. Are your VMs located in different datacenters? Did you check if the nodes are not overloaded? An unresponsive node can be seen as down even if it's temporary. Romain Phil Luckhurst phil.luckhu...@powerassure.com a écrit sur 03/03/2014 15:16:25 : De : Phil Luckhurst phil.luckhu...@powerassure.com A : cassandra-u...@incubator.apache.org, Date : 03/03/2014 15:17 Objet : Gossip intermittently marks node as DOWN We have a 2 node Cassandra 2.0.5 cluster running on a couple of VMWare hosted virtual machines using Ubuntu 12.04 for testing. As you can see from the log entries below the gossip connection between the nodes regularly goes DOWN and UP. We saw on another post that increasing the phi_convict_threshold may help with this so we increased that to '12' but we still get the same problem. INFO [GossipTasks:1] 2014-02-28 07:51:10,937 Gossiper.java (line 863) InetAddress /10.150.100.20 is now DOWN INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:51:10,951 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 INFO [RequestResponseStage:898] 2014-02-28 07:51:21,411 Gossiper.java (line 849) InetAddress /10.150.100.20 is now UP INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:53:52,100 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 INFO [GossipTasks:1] 2014-02-28 08:06:52,956 Gossiper.java (line 863) InetAddress /10.150.100.20 is now DOWN INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:06:52,963 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 INFO [RequestResponseStage:915] 2014-02-28 08:07:21,447 Gossiper.java (line 849) InetAddress /10.150.100.20 is now UP INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:14:09,613 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 Has anyone got any suggestions for fixing this? Thanks Phil
RE: Gossip intermittently marks node as DOWN
The VMs are hosted on the same ESXi server and they are just running Cassandra. We seem to get this happen even if the nodes appear to be idle; about 2 to 4 times per hour. Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Gossip intermittently marks node as DOWN
From what I understand, this can happen when having many nodes and vnodes by node. How many vnodes did you configure on your nodes ? 2014-03-04 11:37 GMT+01:00 Phil Luckhurst phil.luckhu...@powerassure.com: The VMs are hosted on the same ESXi server and they are just running Cassandra. We seem to get this happen even if the nodes appear to be idle; about 2 to 4 times per hour. Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com. -- Close the World, Open the Net http://www.linux-wizard.net
Re: Gossip intermittently marks node as DOWN
It was created with the default settings so we have 256 per node. Fabrice Facorat wrote From what I understand, this can happen when having many nodes and vnodes by node. How many vnodes did you configure on your nodes ? 2014-03-04 11:37 GMT+01:00 Phil Luckhurst lt; phil.luckhurst@ gt;: The VMs are hosted on the same ESXi server and they are just running Cassandra. We seem to get this happen even if the nodes appear to be idle; about 2 to 4 times per hour. Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html Sent from the cassandra-user@.apache mailing list archive at Nabble.com. -- Close the World, Open the Net http://www.linux-wizard.net -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593204.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Gossip intermittently marks node as DOWN
What is nodetool tpstats telling you? On 4 Mar 2014, at 15:10, Phil Luckhurst phil.luckhu...@powerassure.com wrote: It was created with the default settings so we have 256 per node. Fabrice Facorat wrote From what I understand, this can happen when having many nodes and vnodes by node. How many vnodes did you configure on your nodes ? 2014-03-04 11:37 GMT+01:00 Phil Luckhurst lt; phil.luckhurst@ gt;: The VMs are hosted on the same ESXi server and they are just running Cassandra. We seem to get this happen even if the nodes appear to be idle; about 2 to 4 times per hour. Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html Sent from the cassandra-user@.apache mailing list archive at Nabble.com. -- Close the World, Open the Net http://www.linux-wizard.net -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593204.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Gossip intermittently marks node as DOWN
Here's the tpstats output from both nodes. Johnny Miller wrote What is nodetool tpstats telling you? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593206.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Re: Gossip intermittently marks node as DOWN
That looks healthy - nothing blocked or dropped. On 4 Mar 2014, at 16:12, Phil Luckhurst phil.luckhu...@powerassure.com wrote: Here's the tpstats output from both nodes. Johnny Miller wrote What is nodetool tpstats telling you? -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593206.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.
Gossip intermittently marks node as DOWN
We have a 2 node Cassandra 2.0.5 cluster running on a couple of VMWare hosted virtual machines using Ubuntu 12.04 for testing. As you can see from the log entries below the gossip connection between the nodes regularly goes DOWN and UP. We saw on another post that increasing the phi_convict_threshold may help with this so we increased that to '12' but we still get the same problem. INFO [GossipTasks:1] 2014-02-28 07:51:10,937 Gossiper.java (line 863) InetAddress /10.150.100.20 is now DOWN INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:51:10,951 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 INFO [RequestResponseStage:898] 2014-02-28 07:51:21,411 Gossiper.java (line 849) InetAddress /10.150.100.20 is now UP INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:53:52,100 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 INFO [GossipTasks:1] 2014-02-28 08:06:52,956 Gossiper.java (line 863) InetAddress /10.150.100.20 is now DOWN INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:06:52,963 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 INFO [RequestResponseStage:915] 2014-02-28 08:07:21,447 Gossiper.java (line 849) InetAddress /10.150.100.20 is now UP INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:14:09,613 OutboundTcpConnection.java (line 386) Handshaking version with /10.150.100.20 Has anyone got any suggestions for fixing this? Thanks Phil -- View this message in context: http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189.html Sent from the cassandra-u...@incubator.apache.org mailing list archive at Nabble.com.