[ https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17238836#comment-17238836 ]
Sam Tunnicliffe commented on CASSANDRA-16213: --------------------------------------------- Sorry I'm a bit late to the party here. I think the general approach of using the shadow round to learn persisted peer info about a down host is sound, but I believe this implementation is a bit overcomplicated. I don't think we need to modify the behaviour of assassinate or make any changes regarding which {{STATUS}} we can replace. I've pushed a simplified version here: https://github.com/beobal/cassandra/tree/beobal/16213-trunk It doesn't currently pass your new dtests, but I think that's mostly to do with the expectations being specific to your implementation. I've checked that all the scenarios your tests cover work as expected using ccm clusters. Also, aside from the fix for replacing down nodes, all other behaviour is the same as current trunk wrt assasinate, eviction of fat clients etc. I'll try and dig into the dtests this week and update them. > Cannot replace_address /X because it doesn't exist in gossip > ------------------------------------------------------------ > > Key: CASSANDRA-16213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-16213 > Project: Cassandra > Issue Type: Bug > Components: Cluster/Gossip, Cluster/Membership > Reporter: David Capwell > Assignee: David Capwell > Priority: Normal > Fix For: 4.0-beta > > > We see this exception around nodes crashing and trying to do a host > replacement; this error appears to be correlated around multiple node > failures. > A simplified case to trigger this is the following > *) Have a N node cluster > *) Shutdown all N nodes > *) Bring up N-1 nodes (at least 1 seed, else replace seed) > *) Host replace the N-1th node -> this will fail with the above > The reason this happens is that the N-1th node isn’t gossiping anymore, and > the existing nodes do not have its details in gossip (but have the details in > the peers table), so the host replacement fails as the node isn’t known in > gossip. > This affects all versions (tested 3.0 and trunk, assume 2.2 as well) -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org