[ 
https://issues.apache.org/jira/browse/CASSANDRA-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936372#comment-14936372
 ] 

Joel Knighton commented on CASSANDRA-10413:
-------------------------------------------

It looks like two possible situations were causing this.

1. A node receives the decommission and excises the other node from its table. 
On restart after hard crash, it loads the tokens from the system keyspace, 
which still contains the excised node. Since there are MV updates replayed in 
the commitlog before the records excising the node, the error occurs.

2. A node crashes during the decommission of the other node. When it comes back 
up, it will not have excised the decommissioned node.

To address situation 1, the simplest fix is to force a blocking flush in 
removeEndpoint.

2 is less clear. I took Jake's suggestion to check the gossip state in mutateMV 
and when not JOINED always post to the batchlog. I'm not yet sure that 
selecting the local address as the view replica is always appropriate in this 
situation.

I've pushed a branch with these changes at 
[10413|https://github.com/jkni/cassandra/tree/10413].

With these changes, there are no longer crashes on commitlog replay. I'm still 
running tests.

> Replaying materialized view updates from commitlog after node decommission 
> crashes Cassandra
> --------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-10413
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10413
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Joel Knighton
>            Assignee: T Jake Luciani
>            Priority: Critical
>             Fix For: 3.0.0 rc2
>
>         Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test, runnable as
> {code}
> lein with-profile +trunk test :only 
> cassandra.mv-test/mv-crash-subset-decommission
> {code}
> This test crashes/restarts nodes while decommissioning nodes. These actions 
> are not coordinated.
> In [10164|https://issues.apache.org/jira/browse/CASSANDRA-10164], we 
> introduced a change to re-apply materialized view updates on commitlog replay.
> Some nodes, upon restart, will crash in commitlog replay. They throw the 
> "Trying to get the view natural endpoint on a non-data replica" runtime 
> exception in getViewNaturalEndpoint. I added logging to 
> getViewNaturalEndpoint to show the results of 
> replicationStrategy.getNaturalEndpoints for the baseToken and viewToken.
> It can be seen that these problems occur when the baseEndpoints and 
> viewEndpoints are identical but do not contain the broadcast address of the 
> local node.
> For example, a node at 10.0.0.5 crashes on replay of a write whose base token 
> and view token replicas are both [10.0.0.2, 10.0.0.4, 10.0.0.6]. It seems we 
> try to guard against this by considering pendingEndpoints for the viewToken, 
> but this does not appear to be sufficient.
> I've attached the system.logs for a test run with added logging. In the 
> attached logs, n1 is at 10.0.0.2, n2 is at 10.0.0.3, and so on. 10.0.0.6/n5 
> is the decommissioned node.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to