[ 
https://issues.apache.org/jira/browse/CASSANDRA-20659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17953209#comment-17953209
 ] 

David Capwell commented on CASSANDRA-20659:
-------------------------------------------

Thanks for the reply [~samt]!  The issue was found in 4.1 and the first patch 
was written there, but my habit is to patch trunk then back port (was about to 
back port to 4.0 right now as Brandon/Blake both gave thumbs up on direction).

bq. Gossip state is now the source of truth only for transient node state like 
LOAD & RPC_READY.

And SEVERITY =)

The trunk patch makes this logic more likely to progress, but to your point the 
impact is mostly to 4.x and 5.0; TCM solves all problems =D.

Gossip is mostly in sync cross all branches (other than shadow) so the patch is 
likely to be very much the same... ill get the 4.x / 5.0 patches out today

> Gossip doesn't converge due to race condition when updating EndpointStates 
> multiple fields
> ------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-20659
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-20659
>             Project: Apache Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: David Capwell
>            Assignee: David Capwell
>            Priority: Normal
>             Fix For: 4.1.x, 5.0.x, 5.x
>
>          Time Spent: 40m
>  Remaining Estimate: 0h
>
> The issue seen is during shrinks or token moves the cluster gets into a state 
> where some of the nodes never converge and see the latest STATUS state for 
> the changed peers.
> In testing this it was found that:
> 1) org.apache.cassandra.gms.Gossiper#applyStateLocally expects to run in a 
> single thread, so doesn't take any locks
> 2) org.apache.cassandra.gms.Gossiper.GossipTask runs in another thread and 
> uses a taskLock to avoid sending partial state
> 3) org.apache.cassandra.gms.Gossiper#applyNewStates gets called when the 
> generation matches, and tries to apply the state sequentially.
> The theory (and test) is
> 1) localState.setHeartBeatState(remoteState.getHeartBeatState()); runs
> 2) something (gossip or paxos) read the state
> 3) localState.addApplicationStates(updatedStates); updates the state
> the "something" in step 2 sends around the heartbeat which cause others to 
> see a higher max version, so the delta logic won't see the mutations done in 
> step 3



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to