[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Joel Knighton (JIRA) Thu, 08 Oct 2015 16:42:00 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14949611#comment-14949611
 ]


Joel Knighton commented on CASSANDRA-10231:
-------------------------------------------

Just to clarify since scope changed: some of the symptoms of the initial Jepsen 
tests may have been addressed by related gossip tickets, but failures are no 
longer reproducible.

The 3.0 patch, which does not try to remove hints if the {{hostId}} of the 
{{LEFT}} endpoint is null, looks good for me on code quality.

I'm generally +1 on the dtest. I've pushed a version with some nitpicking 
(spelling, unused code removed) 
[here|https://github.com/jkni/cassandra-dtest/tree/10231-nits]. I'm definitely 
in favor of such a 

That said, I'd like to propose an idea for the root cause which should be 
fixed. With this root cause fixed, the 3.0 patch should no longer be necessary.

I believe this issue was introduced in 3.0, which would explain why you could 
not reproduce on 2.1 or 2.2.

To accomodate [CASSANDRA-6696], in [CASSANDRA-9317], we started populating 
TokenMetadata before commitlog replay. If we revert [CASSANDRA-9317], the dtest 
no longer reproduces the issue.

If the changes to the {{PEERS}} table in the SystemKeyspace upon removing an 
endpoint are not flushed to disk and are instead in the commitlog, when we 
populate TokenMetadata, we will populate these tokens for the decommissioned 
node. This can be seen in the logs of node2 in the dtest.

Since the node has left, it is quarantined, so gossip updates will not be 
applied, so these tokens will not be removed from TokenMetadata. This is the 
cause of the stale status entries.

To ensure these changes are flushed to disk when a node is {{LEFT}}, we can 
{{forceBlockingFlush}} of {{PEERS}} in {{SystemKeyspace.removeEndpoint}}. With 
this change, the dtest passes. I've pushed a branch with this fix 
[here|https://github.com/jkni/cassandra/tree/10231-alternate].

Thoughts [~Stefania]?

> Null status entries on nodes that crash during decommission of a different 
> node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Stefania
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Reply via email to