[jira] [Comment Edited] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Joel Knighton (JIRA) Fri, 09 Oct 2015 13:17:46 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14951117#comment-14951117
 ]


Joel Knighton edited comment on CASSANDRA-10231 at 10/9/15 8:16 PM:
--------------------------------------------------------------------

I think the force blocking flush approach behavior is the least invasive and 
most likely to ensure correctness.

With log entries, I've confirmed that my suspected behavior occurs.  Before 
commitlog replay, we {{populateTokenMetadata}} for node1, node2, and node3.  
After commitlog replay, when we {{populateTokenMetadata}}, we only consider 
node2 and node3.  node1 stays present in the {{tokenMetadata}}.

I pushed a branch 
[10231-alternate|https://github.com/jkni/cassandra/commits/10231-alternate] 
with a {{forceBlockingFlush}} only in {{removeEndpoint}}. I'll create a 
follow-up ticket to further discuss the use of {{forceBlockingFlush}} for other 
{{PEERS}}-related methods in SystemKeyspace.

With this change, the attached dtest passes.

In CI, there are no unit test failures out of the ordinary.

In CI, there is only one dtest failure outside of historically flappy 
tests/tests with known problems.
This failure is in {{commitlog_test.TestCommitLog.stop_failure_policy_test}} 
and is reproducible locally. In the original patch, upon commitlog failure, 
when gossip was shutdown, we would notify {{onChange}} which in 
{{handleStateNormal}} would {{updateTokens}} for the local node, which would 
call {{removeEndpoint}}, causing the thread to hang in {{forceBlockingFlush}} 
(due to the aforementioned commitlog failure).

Looking at git history, it seems this {{removeEndpoint}} is precautionary and 
there is currently no gossip transition that results in the local node being 
present in {{PEERS}}. As a result, I've removed this call from 
{{updateTokens}}, so the above commitlog test passes. This commit has been 
pushed to the branch 
[10231-alternate|https://github.com/jkni/cassandra/commits/10231-alternate].  
The attached dtest still passes, as expected.

I'm waiting for CI to finish for this change; in the meantime, any feedback or 
review would be great.



was (Author: jkni):
I think the force blocking flush approach behavior is the least invasive and 
most likely to ensure correctness.

With log entries, I've confirmed that my suspected behavior occurs.  Before 
commitlog replay, we {{populateTokenMetadata}} for node1, node2, and node3.  
After commitlog replay, when we {{populateTokenMetadata}}, we only consider 
node2 and node3.  node1 stays present in the {{tokenMetadata}}.

I pushed a branch 
[10231-alternate|https://github.com/jkni/cassandra/commits/10231-alternate] 
with a {{forceBlockingFlush}} only in {{removeEndpoint}}. I'll create a 
follow-up ticket to further discuss the use of {{forceBlockingFlush}} for other 
{{PEERS}}-related methods in SystemKeyspace.

In CI, there are no unit test failures out of the ordinary.

In CI, there is only one dtest failure outside of historically flappy 
tests/tests with known problems.
This failure is in {{commitlog_test.TestCommitLog.stop_failure_policy_test}} 
and is reproducible locally. In the original patch, upon commitlog failure, 
when gossip was shutdown, we would notify {{onChange}} which in 
{{handleStateNormal}} would {{updateTokens}} for the local node, which would 
call {{removeEndpoint}}, causing the thread to hang in {{forceBlockingFlush}} 
(due to the aforementioned commitlog failure).

Looking at git history, it seems this {{removeEndpoint}} is precautionary and 
there is currently no gossip transition that results in the local node being 
present in {{PEERS}}. As a result, I've removed this call from 
{{updateTokens}}, so the above commitlog test passes. This commit has been 
pushed to the branch 
[10231-alternate|https://github.com/jkni/cassandra/commits/10231-alternate].

I'm waiting for CI to finish for this change; in the meantime, any feedback or 
review would be great.


> Null status entries on nodes that crash during decommission of a different 
> node
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10231
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: Joel Knighton
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Comment Edited] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

Reply via email to