[ 
https://issues.apache.org/jira/browse/CASSANDRA-12653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15513406#comment-15513406
 ] 

Stefan Podkowinski commented on CASSANDRA-12653:
------------------------------------------------


The attached patch will solve this issue by using a separate data structure for 
the information gathered during shadow rounds and by using a timestamp for the 
first syn send. 

||trunk||3.0||2.2||
|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-trunk]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-3.0]|[branch|https://github.com/spodkowinski/cassandra/tree/CASSANDRA-12653-2.2]|
|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-trunk-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.0-dtest/]|[dtest|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-2.2-dtest/]|
|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-trunk-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-3.0-testall/]|[testall|http://cassci.datastax.com/view/Dev/view/spodkowinski/job/spodkowinski-CASSANDRA-12653-2.2-testall/]|


I can also create patches for merge conflicts if needed after agreeing on which 
versions to target.


> In-flight shadow round requests
> -------------------------------
>
>                 Key: CASSANDRA-12653
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12653
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Stefan Podkowinski
>            Assignee: Stefan Podkowinski
>            Priority: Minor
>
> Bootstrapping or replacing a node in the cluster requires to gather and check 
> some host IDs or tokens by doing a gossip "shadow round" once before joining 
> the cluster. This is done by sending a gossip SYN to all seeds until we 
> receive a response with the cluster state, from where we can move on in the 
> bootstrap process. Receiving a response will call the shadow round done and 
> calls {{Gossiper.resetEndpointStateMap}} for cleaning up the received state 
> again.
> The issue here is that at this point there might be other in-flight requests 
> and it's very likely that shadow round responses from other seeds will be 
> received afterwards, while the current state of the bootstrap process doesn't 
> expect this to happen (e.g. gossiper may or may not be enabled). 
> One side effect will be that MigrationTasks are spawned for each shadow round 
> reply except the first. Tasks might or might not execute based on whether at 
> execution time {{Gossiper.resetEndpointStateMap}} had been called, which 
> effects the outcome of {{FailureDetector.instance.isAlive(endpoint))}} at 
> start of the task. You'll see error log messages such as follows when this 
> happend:
> {noformat}
> INFO  [SharedPool-Worker-1] 2016-09-08 08:36:39,255 Gossiper.java:993 - 
> InetAddress /xx.xx.xx.xx is now UP
> ERROR [MigrationStage:1]    2016-09-08 08:36:39,255 FailureDetector.java:223 
> - unknown endpoint /xx.xx.xx.xx
> {noformat}
> Although is isn't pretty, I currently don't see any serious harm from this, 
> but it would be good to get a second opinion (feel free to close as "wont 
> fix").
> /cc [~Stefania] [~thobbs]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to