[
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16801208#comment-16801208
]
Ariel Weisberg commented on CASSANDRA-15059:
--------------------------------------------
[You don't need to create a ListenableFutureTask you can just {{submit}} the
{{Runnable}} to the {{ExecutorService}} and it will return a
{{Future}}|https://github.com/apache/cassandra/compare/cassandra-3.0...bdeggleston:15059-3.0?expand=1#diff-8be666c70553b1f0017a01458c490f47R332].
[This creates significant risk of deadlock if the Gossiper ends up needing to
acquire something owned by the thread submitting the task. Document that risk
here.|https://github.com/apache/cassandra/compare/cassandra-3.0...bdeggleston:15059-3.0?expand=1#diff-8be666c70553b1f0017a01458c490f47R323]
This is extremely fragile. {{Gossiper.markDead}} calls out to every subscriber.
Also we call into {{Gossiper.convict}} from {{FailureDetector}} but then call
back out to {{FailureDetector}} from {{Gossiper}}. I am only just beginning to
track down where we call in and out.
{{Gossiper.assassinate}} also calls out to various things via
{{Gossiper.handleMajorStateChange}}.
I am not 100% on the right way to tease this out to be a good design. If this
action is occurring asynchronously then it should be explicit and return a
ListenableFuture that is either ignored if it can really occur asynchronously
without issue, waited on safely while not holding any locks, or execution
resumed later by attaching a listener.
The fan in and fan out is high enough it's going to take some time to for me to
figure out if the proposed change is safe.
> Gossiper#markAlive can race with Gossiper#markDead
> --------------------------------------------------
>
> Key: CASSANDRA-15059
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
> Project: Cassandra
> Issue Type: Bug
> Components: Cluster/Gossip
> Reporter: Blake Eggleston
> Assignee: Blake Eggleston
> Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in
> a single thread (the gossip stage). Gossiper#convict, however, can be called
> from the GossipTasks thread. This creates a race where calls to
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip
> state. Gossiper#assassinateEndpoint has a similar problem, being called from
> the mbean server thread.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]