[ 
https://issues.apache.org/jira/browse/CASSANDRA-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13869608#comment-13869608
 ] 

Brandon Williams commented on CASSANDRA-6571:
---------------------------------------------

This patch solves the issue outlined in the description for me.  I created a 
firewall between two nodes and they do mark each other as dead, despite being 
able to see each other through a third node.  What doesn't work though, if is 
the partition is temporary and heals, the nodes never mark each other up, even 
though the connection has been (re)established as in Vijay's point 2).  
However, that is a separate problem that has always existed, so let's move that 
to another ticket.  Committed Sankalp's patch.

> Quickly restarted nodes can list others as down indefinitely
> ------------------------------------------------------------
>
>                 Key: CASSANDRA-6571
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-6571
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Richard Low
>            Assignee: sankalp kohli
>              Labels: gossip
>             Fix For: 2.0.5
>
>         Attachments: 6571.txt
>
>
> In a healthy cluster, if a node is restarted quickly, it may list other nodes 
> as down when it comes back up and never list them as up.  I reproduced it on 
> a small cluster running in Docker containers.
> 1. Have a healthy 5 node cluster:
> {quote}
> $ nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address          Load       Tokens  Owns (effective)  Host ID             
>                   Rack
> UN  192.168.100.1    40.88 KB   256     38.3%             
> 92930ef6-1b29-49f0-a8cd-f962b55dca1b  rack1
> UN  192.168.100.254  80.63 KB   256     39.6%             
> ef15a717-9d60-48fb-80a9-e0973abdd55e  rack1
> UN  192.168.100.3    87.78 KB   256     40.8%             
> 4e6765db-97ed-4429-a9f4-8e29de247f18  rack1
> UN  192.168.100.2    75.22 KB   256     40.6%             
> e89bc581-5345-4abd-88ba-7018371940fc  rack1
> UN  192.168.100.4    80.83 KB   256     40.8%             
> 466a9798-d484-44f0-aae8-bb2b78d80331  rack1
> {quote}
> 2. Kill a node and restart it quickly:
> bq. kill -9 <pid> && start-cassandra
> 3. Wait for the node to come back and more often than not, it lists one or 
> more other nodes as down indefinitely:
> {quote}
> $ nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  Address          Load       Tokens  Owns (effective)  Host ID             
>                   Rack
> UN  192.168.100.1    40.88 KB   256     38.3%             
> 92930ef6-1b29-49f0-a8cd-f962b55dca1b  rack1
> UN  192.168.100.254  80.63 KB   256     39.6%             
> ef15a717-9d60-48fb-80a9-e0973abdd55e  rack1
> DN  192.168.100.3    87.78 KB   256     40.8%             
> 4e6765db-97ed-4429-a9f4-8e29de247f18  rack1
> DN  192.168.100.2    75.22 KB   256     40.6%             
> e89bc581-5345-4abd-88ba-7018371940fc  rack1
> DN  192.168.100.4    80.83 KB   256     40.8%             
> 466a9798-d484-44f0-aae8-bb2b78d80331  rack1
> {quote}
> From trace logging, here's what I think is going on:
> 1. The nodes are all happy gossiping
> 2. Restart node X. When it comes back up it starts gossiping with the other 
> nodes.
> 3. Before node X marks node Y as alive, X sends an echo message (introduced 
> in CASSANDRA-3533)
> 4. The echo message is received by Y. To reply, Y attempts to reuse a 
> connection to X. The connection is dead, but the message is attempted anyway 
> but fails.
> 5. X never receives the echo back, so Y isn't marked as alive.
> 6. X gossips to Y again, but because the endpoint isAlive() returns true, it 
> never calls markAlive() to properly set Y as alive.
> I tried to fix this by defaulting isAlive=false in the constructor of 
> EndpointState. This made it less likely to mark a node as down but it still 
> happens.
> The workaround is to leave a node down for a while so the connections die on 
> the remaining nodes.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to