[ 
https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874646#comment-15874646
 ] 

Stefan Podkowinski commented on CASSANDRA-12653:
------------------------------------------------

Thanks for your comments, Joel! Patch has been rebased and updated by following 
your suggestions, except from the issue mentioned below. 

I've also remove two @VisibleForTesting annotations in 2.2, since the related 
test isn't available in this branch.

bq. On all versions, using firstSynSendAt == 0 to check if it has been 
initialized isn't entirely safe. It's entirely legal (although admittedly rare) 
for System.nanoTime to return 0. If this happened, all acks would be rejected.

In this very rare case we'd only discard the acks of 1 or 2 nodes from a single 
gossip round, as the firstSynSendAt value would afterwards be set by the next 
periodic sendGossip call. I'd therefor prefer to leave the variable as is, 
beside making it volatile.


||2.2||3.0||3.11||trunk||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-2.2]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-3.11]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-trunk]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-2.2-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.11-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-trunk-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-2.2-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.11-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-trunk-testall/]|



> In-flight shadow round requests
> -------------------------------
>
>                 Key: CASSANDRA-12653
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Podkowinski
>            Priority: Minor
>             Fix For: 2.2.x, 3.0.x, 3.11.x, 4.x
>
>         Attachments: 12653-2.2.patch, 12653-3.0.patch, 12653-trunk.patch
>
>
> Bootstrapping or replacing a node in the cluster requires to gather and check 
> some host IDs or tokens by doing a gossip "shadow round" once before joining 
> the cluster. This is done by sending a gossip SYN to all seeds until we 
> receive a response with the cluster state, from where we can move on in the 
> bootstrap process. Receiving a response will call the shadow round done and 
> calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state 
> again.
> The issue here is that at this point there might be other in-flight requests 
> and it's very likely that shadow round responses from other seeds will be 
> received afterwards, while the current state of the bootstrap process doesn't 
> expect this to happen (e.g. gossiper may or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round 
> reply except the first. Tasks might or might not execute based on whether at 
> execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
> effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at 
> start of the task. You'll see error log messages such as follows when this 
> happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - 
> InetAddress /xx.xx.xx.xx is now UP
> ERROR [MigrationStage:1]    2016-09-08 08:36:39,255 FailureDetector.java:223 
> - unknown endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, 
> but it would be good to get a second opinion (feel free to close as "wont 
> fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to