[
https://issues.apache.org/jira/browse/CASSANDRA-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13868502#comment-13868502
]
sankalp kohli commented on CASSANDRA-6571:
------------------------------------------
The problem is that we are adding the endpoint state of the newly discovered
endpoint to endpointStateMap in handleMajorStateChange method. This endpoint
state has isAlive=true because the endpoint is alive.
So if echo fails, realMarkAlive(new refactored method in trunk) method will
never run.
The fix is to make the isAlive=false before sending the echo message. The
reason we should do it is because now we rely on echo message to mark anything
alive. So it should be marked false even if we hear about it being alive from
another node.
Fix in code.
private void markAlive(final InetAddress addr, final EndpointState localState)
{
if (MessagingService.instance().getVersion(addr) <
MessagingService.VERSION_20)
{
realMarkAlive(addr, localState);
return;
}
+ localState.markDead();
MessageOut<EchoMessage> echoMessage = new
MessageOut<EchoMessage>(MessagingService.Verb.ECHO, new EchoMessage(),
EchoMessage.serializer);
logger.trace("Sending a EchoMessage to {}", addr);
IAsyncCallback echoHandler = new IAsyncCallback()
> Quickly restarted nodes can list others as down indefinitely
> ------------------------------------------------------------
>
> Key: CASSANDRA-6571
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6571
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Richard Low
> Assignee: Vijay
> Labels: gossip
> Fix For: 2.0.5
>
>
> In a healthy cluster, if a node is restarted quickly, it may list other nodes
> as down when it comes back up and never list them as up. I reproduced it on
> a small cluster running in Docker containers.
> 1. Have a healthy 5 node cluster:
> {quote}
> $ nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 192.168.100.1 40.88 KB 256 38.3%
> 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1
> UN 192.168.100.254 80.63 KB 256 39.6%
> ef15a717-9d60-48fb-80a9-e0973abdd55e rack1
> UN 192.168.100.3 87.78 KB 256 40.8%
> 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1
> UN 192.168.100.2 75.22 KB 256 40.6%
> e89bc581-5345-4abd-88ba-7018371940fc rack1
> UN 192.168.100.4 80.83 KB 256 40.8%
> 466a9798-d484-44f0-aae8-bb2b78d80331 rack1
> {quote}
> 2. Kill a node and restart it quickly:
> bq. kill -9 <pid> && start-cassandra
> 3. Wait for the node to come back and more often than not, it lists one or
> more other nodes as down indefinitely:
> {quote}
> $ nodetool status
> Datacenter: datacenter1
> =======================
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID
> Rack
> UN 192.168.100.1 40.88 KB 256 38.3%
> 92930ef6-1b29-49f0-a8cd-f962b55dca1b rack1
> UN 192.168.100.254 80.63 KB 256 39.6%
> ef15a717-9d60-48fb-80a9-e0973abdd55e rack1
> DN 192.168.100.3 87.78 KB 256 40.8%
> 4e6765db-97ed-4429-a9f4-8e29de247f18 rack1
> DN 192.168.100.2 75.22 KB 256 40.6%
> e89bc581-5345-4abd-88ba-7018371940fc rack1
> DN 192.168.100.4 80.83 KB 256 40.8%
> 466a9798-d484-44f0-aae8-bb2b78d80331 rack1
> {quote}
> From trace logging, here's what I think is going on:
> 1. The nodes are all happy gossiping
> 2. Restart node X. When it comes back up it starts gossiping with the other
> nodes.
> 3. Before node X marks node Y as alive, X sends an echo message (introduced
> in CASSANDRA-3533)
> 4. The echo message is received by Y. To reply, Y attempts to reuse a
> connection to X. The connection is dead, but the message is attempted anyway
> but fails.
> 5. X never receives the echo back, so Y isn't marked as alive.
> 6. X gossips to Y again, but because the endpoint isAlive() returns true, it
> never calls markAlive() to properly set Y as alive.
> I tried to fix this by defaulting isAlive=false in the constructor of
> EndpointState. This made it less likely to mark a node as down but it still
> happens.
> The workaround is to leave a node down for a while so the connections die on
> the remaining nodes.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)