[ 
https://issues.apache.org/jira/browse/CASSANDRA-15059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815537#comment-16815537
 ] 

Blake Eggleston edited comment on CASSANDRA-15059 at 4/11/19 4:03 PM:
----------------------------------------------------------------------

Maybe we could compromise a bit here. There’s definitely value in rooting out 
places where we violate the assumptions gossiper makes about concurrency, but I 
worry that if we don’t catch all of them we'll turn a race that _may_ cause a 
problem into a hard failure. What if we logged an error by default, but threw 
an exception if a system property was set? This way we can get feedback and 
scary messages in the logs, but we haven't made things any less stable, and our 
tests will fail if we’re doing something we shouldn’t be. I have no problem 
putting that in 3.x and up.

I would like to punt the refactor to 4.next though. I do think it’s valuable, 
but I think it’s a bit late in the game to add it to 4.0.

WDYT?


was (Author: bdeggleston):
Maybe we could compromise a bit here. There’s definitely value in rooting out 
places where we violate the assumptions gossiper makes about concurrency, but I 
worry that if we don’t catch all of them we'll turn a relatively low impact 
race (low enough that we haven't caught it) into a hard failure. What if we 
logged an error by default, but threw an exception if a system property was 
set? This way we can get feedback and scary messages in the logs, but we 
haven't made things any less stable, and our tests will fail if we’re doing 
something we shouldn’t be. I have no problem putting that in 3.x and up.

I would like to punt the refactor to 4.next though. I do think it’s valuable, 
but I think it’s a bit late in the game to add it to 4.0.

WDYT?

> Gossiper#markAlive can race with Gossiper#markDead
> --------------------------------------------------
>
>                 Key: CASSANDRA-15059
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15059
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Blake Eggleston
>            Assignee: Blake Eggleston
>            Priority: Normal
>
> The Gossiper class is not threadsafe and assumes all state changes happen in 
> a single thread (the gossip stage). Gossiper#convict, however, can be called 
> from the GossipTasks thread. This creates a race where calls to 
> Gossiper#markAlive and Gossiper#markDead can interleave, corrupting gossip 
> state. Gossiper#assassinateEndpoint has a similar problem, being called from 
> the mbean server thread.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to