Stefan Podkowinski created CASSANDRA-12653:

             Summary: In-flight shadow round requests
                 Key: CASSANDRA-12653
             Project: Cassandra
          Issue Type: Bug
          Components: Distributed Metadata
            Reporter: Stefan Podkowinski
            Priority: Minor

Bootstrapping or replacing a node in the cluster requires to gather and check 
some host IDs or tokens by doing a gossip "shadow round" once before joining 
the cluster. This is done by sending a gossip SYN to all seeds until we receive 
a response with the cluster state, from where we can move on in the bootstrap 
process. Receiving a response will call the shadow round done and calls 
{{Gossiper.resetEndpointStateMap}} for cleaning up the received state again.

The issue here is that at this point there might be other in-flight requests 
and it's very likely that shadow round responses from other seeds will be 
received afterwards, while the current state of the bootstrap process doesn't 
expect this to happen (e.g. gossiper may or may not be enabled). 

One side effect will be that MigrationTasks are spawned for each shadow round 
reply except the first. Tasks might or might not execute based on whether at 
execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at start 
of the task. You'll see error log messages such as follows when this happend:

INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 - 
InetAddress /xx.xx.xx.xx is now UP
ERROR [MigrationStage:1]    2016-09-08 08:36:39,255 - 
unknown endpoint /xx.xx.xx.xx

Although is isn't pretty, I currently don't see any serious harm from this, but 
it would be good to get a second opinion (feel free to close as "wont fix").

/cc [~Stefania] [~thobbs]

This message was sent by Atlassian JIRA

Reply via email to