Set phi_convict_threshold to 12 is a good idea if your network is busy. Are your VMs located in different datacenters? Did you check if the nodes are not overloaded? An unresponsive node can be seen as down even if it's temporary.
Romain Phil Luckhurst <phil.luckhu...@powerassure.com> a écrit sur 03/03/2014 15:16:25 : > De : Phil Luckhurst <phil.luckhu...@powerassure.com> > A : cassandra-u...@incubator.apache.org, > Date : 03/03/2014 15:17 > Objet : Gossip intermittently marks node as DOWN > > We have a 2 node Cassandra 2.0.5 cluster running on a couple of VMWare hosted > virtual machines using Ubuntu 12.04 for testing. As you can see from the log > entries below the gossip connection between the nodes regularly goes DOWN > and UP. We saw on another post that increasing the phi_convict_threshold may > help with this so we increased that to '12' but we still get the same > problem. > > INFO [GossipTasks:1] 2014-02-28 07:51:10,937 Gossiper.java (line 863) > InetAddress /10.150.100.20 is now DOWN > INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:51:10,951 > OutboundTcpConnection.java (line 386) Handshaking version with > /10.150.100.20 > INFO [RequestResponseStage:898] 2014-02-28 07:51:21,411 Gossiper.java (line > 849) InetAddress /10.150.100.20 is now UP > INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:53:52,100 > OutboundTcpConnection.java (line 386) Handshaking version with > /10.150.100.20 > INFO [GossipTasks:1] 2014-02-28 08:06:52,956 Gossiper.java (line 863) > InetAddress /10.150.100.20 is now DOWN > INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:06:52,963 > OutboundTcpConnection.java (line 386) Handshaking version with > /10.150.100.20 > INFO [RequestResponseStage:915] 2014-02-28 08:07:21,447 Gossiper.java (line > 849) InetAddress /10.150.100.20 is now UP > INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:14:09,613 > OutboundTcpConnection.java (line 386) Handshaking version with > /10.150.100.20 > > Has anyone got any suggestions for fixing this? > > Thanks > Phil