[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801208#comment-16801208
 ] 

Ariel Weisberg commented on CASSANDRA-15059:
--------------------------------------------

[You don't need to create a ListenableFutureTask you can just {{submit}} the 
{{Runnable}} to the {{ExecutorService}} and it will return a 
{{Future}}|https://github.com/apache/cassandra/compare/cassandra-3.0...bdeggleston:15059-3.0?expand=1#diff-8be666c70553b1f0017a01458c490f47R332].
[This creates significant risk of deadlock if the Gossiper ends up needing to 
acquire something owned by the thread submitting the task. Document that risk 
here.|https://github.com/apache/cassandra/compare/cassandra-3.0...bdeggleston:15059-3.0?expand=1#diff-8be666c70553b1f0017a01458c490f47R323]

This is extremely fragile. {{Gossiper.markDead}} calls out to every subscriber. 
Also we call into {{Gossiper.convict}} from {{FailureDetector}} but then call 
back out to {{FailureDetector}} from {{Gossiper}}. I am only just beginning to 
track down where we call in and out.

{{Gossiper.assassinate}} also calls out to various things via 
{{Gossiper.handleMajorStateChange}}.

I am not 100% on the right way to tease this out to be a good design. If this 
action is occurring asynchronously then it should be explicit and return a 
ListenableFuture that is either ignored if it can really occur asynchronously 
without issue, waited on safely while not holding any locks, or execution 
resumed later by attaching a listener.

The fan in and fan out is high enough it's going to take some time to for me to 
figure out if the proposed change is safe.

> Gossiper#markAlive can race with Gossiper#markDead
> --------------------------------------------------
>
>                 Key: CASSANDRA-15059
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to