[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Stefania (JIRA) Sun, 11 Oct 2015 19:36:22 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14952551#comment-14952551
 ]


Stefania commented on CASSANDRA-10231:
--------------------------------------

I agree with your analysis, only 3 methods insert into {{PEERS}}, of these 
{{updatePeerInfo}} and {{updateTokens}} do not insert anything if the endpoint 
is the local broadcast address whilst the remaining method, 
{{updatePreferredIP}}, is only called from {{OutboundTcpConnectionPool}} for 
remote endpoints. None of the getters expect to find the local ep in {{PEERS}}, 
the code in {{SS.initServer()}} seems to confirm this further with the comment 
at line 614 and by removing the local ep should it be found (IMO this should 
have been an assertion but let's leave it).

I've also checked the latest round of CI and it seems inline with the unpatched 
branch. 

The patch is therefore +1.

Only one tiny nit: I think people _generally_ prefer to drop the parenthesis 
for {{if}} one liners but it's not really in the coding standards so it's your 
choice.

If you are also happy, you can flag this ticket as "READY TO COMMIT" and find a 
committer on IRC.

> Null status entries on nodes that crash during decommission of a different 
> node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Joel Knighton
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Reply via email to