Re: Gossip intermittently marks node as DOWN

2014-03-19 Thread Phil Luckhurst
I think we've found the issue!

It seems that the times on those Cassandra servers was being kept in sync by
vmware tools using the time of the vmware host machine. We have now turned
that off and are using the ntp service to keep the times in sync like we do
for our physical servers and we have not seen the gossip failures for the
last 24 hours.

--
Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593569.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


RE: Gossip intermittently marks node as DOWN

2014-03-04 Thread Romain HARDOUIN
Set phi_convict_threshold to 12 is a good idea if your network is busy. 
Are your VMs located in different datacenters?
Did you check if the nodes are not overloaded? An unresponsive node can be 
seen as down even if it's temporary.

Romain

Phil Luckhurst phil.luckhu...@powerassure.com a écrit sur 03/03/2014 
15:16:25 :

 De : Phil Luckhurst phil.luckhu...@powerassure.com
 A : cassandra-u...@incubator.apache.org, 
 Date : 03/03/2014 15:17
 Objet : Gossip intermittently marks node as DOWN
 
 We have a 2 node Cassandra 2.0.5 cluster running on a couple of VMWare 
hosted
 virtual machines using Ubuntu 12.04 for testing. As you can see from the 
log
 entries below the gossip connection between the nodes regularly goes 
DOWN
 and UP. We saw on another post that increasing the phi_convict_threshold 
may
 help with this so we increased that to '12' but we still get the same
 problem. 
 
  INFO [GossipTasks:1] 2014-02-28 07:51:10,937 Gossiper.java (line 863)
 InetAddress /10.150.100.20 is now DOWN 
  INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:51:10,951
 OutboundTcpConnection.java (line 386) Handshaking version with
 /10.150.100.20 
  INFO [RequestResponseStage:898] 2014-02-28 07:51:21,411 Gossiper.java 
(line
 849) InetAddress /10.150.100.20 is now UP 
  INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:53:52,100
 OutboundTcpConnection.java (line 386) Handshaking version with
 /10.150.100.20 
  INFO [GossipTasks:1] 2014-02-28 08:06:52,956 Gossiper.java (line 863)
 InetAddress /10.150.100.20 is now DOWN 
  INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:06:52,963
 OutboundTcpConnection.java (line 386) Handshaking version with
 /10.150.100.20 
  INFO [RequestResponseStage:915] 2014-02-28 08:07:21,447 Gossiper.java 
(line
 849) InetAddress /10.150.100.20 is now UP 
  INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:14:09,613
 OutboundTcpConnection.java (line 386) Handshaking version with
 /10.150.100.20 
 
  Has anyone got any suggestions for fixing this? 
 
  Thanks 
  Phil


RE: Gossip intermittently marks node as DOWN

2014-03-04 Thread Phil Luckhurst
The VMs are hosted on the same ESXi server and they are just running
Cassandra. We seem to get this happen even if the nodes appear to be idle;
about 2 to 4 times per hour.


Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Gossip intermittently marks node as DOWN

2014-03-04 Thread Fabrice Facorat
From what I understand, this can happen when having many nodes and
vnodes by node. How many vnodes did you configure on your nodes ?

2014-03-04 11:37 GMT+01:00 Phil Luckhurst phil.luckhu...@powerassure.com:
 The VMs are hosted on the same ESXi server and they are just running
 Cassandra. We seem to get this happen even if the nodes appear to be idle;
 about 2 to 4 times per hour.

 
 Phil



 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



-- 
Close the World, Open the Net
http://www.linux-wizard.net


Re: Gossip intermittently marks node as DOWN

2014-03-04 Thread Phil Luckhurst
It was created with the default settings so we have 256 per node.


Fabrice Facorat wrote
 From what I understand, this can happen when having many nodes and
 vnodes by node. How many vnodes did you configure on your nodes ?
 
 2014-03-04 11:37 GMT+01:00 Phil Luckhurst lt;

 phil.luckhurst@

 gt;:
 The VMs are hosted on the same ESXi server and they are just running
 Cassandra. We seem to get this happen even if the nodes appear to be
 idle;
 about 2 to 4 times per hour.

 
 Phil



 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html
 Sent from the 

 cassandra-user@.apache

  mailing list archive at Nabble.com.
 
 
 
 -- 
 Close the World, Open the Net
 http://www.linux-wizard.net





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593204.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Gossip intermittently marks node as DOWN

2014-03-04 Thread Johnny Miller
What is nodetool tpstats telling you?

On 4 Mar 2014, at 15:10, Phil Luckhurst phil.luckhu...@powerassure.com wrote:

 It was created with the default settings so we have 256 per node.
 
 
 Fabrice Facorat wrote
 From what I understand, this can happen when having many nodes and
 vnodes by node. How many vnodes did you configure on your nodes ?
 
 2014-03-04 11:37 GMT+01:00 Phil Luckhurst lt;
 
 phil.luckhurst@
 
 gt;:
 The VMs are hosted on the same ESXi server and they are just running
 Cassandra. We seem to get this happen even if the nodes appear to be
 idle;
 about 2 to 4 times per hour.
 
 
 Phil
 
 
 
 --
 View this message in context:
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593199.html
 Sent from the 
 
 cassandra-user@.apache
 
 mailing list archive at Nabble.com.
 
 
 
 -- 
 Close the World, Open the Net
 http://www.linux-wizard.net
 
 
 
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593204.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Re: Gossip intermittently marks node as DOWN

2014-03-04 Thread Phil Luckhurst
Here's the tpstats output from both nodes.






Johnny Miller wrote
 What is nodetool tpstats telling you?





--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593206.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.


Re: Gossip intermittently marks node as DOWN

2014-03-04 Thread Johnny Miller
That looks healthy - nothing blocked or dropped.



On 4 Mar 2014, at 16:12, Phil Luckhurst phil.luckhu...@powerassure.com wrote:

 Here's the tpstats output from both nodes.
 
 
 
 
 
 
 Johnny Miller wrote
 What is nodetool tpstats telling you?
 
 
 
 
 
 --
 View this message in context: 
 http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189p7593206.html
 Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
 Nabble.com.



Gossip intermittently marks node as DOWN

2014-03-03 Thread Phil Luckhurst
We have a 2 node Cassandra 2.0.5 cluster running on a couple of VMWare hosted
virtual machines using Ubuntu 12.04 for testing. As you can see from the log
entries below the gossip connection between the nodes regularly goes DOWN
and UP. We saw on another post that increasing the phi_convict_threshold may
help with this so we increased that to '12' but we still get the same
problem. 

 INFO [GossipTasks:1] 2014-02-28 07:51:10,937 Gossiper.java (line 863)
InetAddress /10.150.100.20 is now DOWN 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:51:10,951
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 
 INFO [RequestResponseStage:898] 2014-02-28 07:51:21,411 Gossiper.java (line
849) InetAddress /10.150.100.20 is now UP 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 07:53:52,100
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 
 INFO [GossipTasks:1] 2014-02-28 08:06:52,956 Gossiper.java (line 863)
InetAddress /10.150.100.20 is now DOWN 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:06:52,963
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 
 INFO [RequestResponseStage:915] 2014-02-28 08:07:21,447 Gossiper.java (line
849) InetAddress /10.150.100.20 is now UP 
 INFO [HANDSHAKE-/10.150.100.20] 2014-02-28 08:14:09,613
OutboundTcpConnection.java (line 386) Handshaking version with
/10.150.100.20 

 Has anyone got any suggestions for fixing this? 
  
 Thanks 
 Phil



--
View this message in context: 
http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/Gossip-intermittently-marks-node-as-DOWN-tp7593189.html
Sent from the cassandra-u...@incubator.apache.org mailing list archive at 
Nabble.com.