ring state out of sync in build 883477

B. Todd Burruss Mon, 23 Nov 2009 16:39:10 -0800

i'm observing the following on a cluster that started with 4 nodes.  i
have been killing and restarting the various nodes as i test cassandra
and now i'm seeing a lot of NotFoundException exceptions in the client
because what i believe is ring state out of sync between the two nodes
that are still up and available.  The first ring state shown below
reflects the current state of the cluster.  Also I have seen similar
issues when one of the nodes thinks another node is still available when
in fact it has been killed.  it seems to be related to bringing up,
killing nodes too fast and not letting them figure out when a node is
"dead".  in this case i see TimedOutException related to NIO
SocketChannel class.


thx!

[cassandra.883477]$ bin/nodeprobe -host gen-app02.dev.real.com -port
8080 ring
Address       Status     Load          Range
Ring

144038903974614862325597275257769797985    
172.27.128.186Down       22.17 MB
31124469348629903091013930339840898757     |<--|
172.27.128.23 Down       22.17 MB
64378740291415296162944450043143967518     |   |
172.27.128.22 Up         22.17 MB
121134220722269938669001112695509564769    |   |
172.27.128.185Up         14.69 MB
144038903974614862325597275257769797985    |-->|

[cassandra.883477]$ bin/nodeprobe -host vmguest85.prognet.com -port 8080
ring
Address       Status     Load          Range
Ring

144038903974614862325597275257769797985    
172.27.128.22 Up         22.17 MB
121134220722269938669001112695509564769    |<--|
172.27.128.185Up         14.69 MB
144038903974614862325597275257769797985    |-->|
[cassandra.883477]$

ring state out of sync in build 883477

Reply via email to