[
https://issues.apache.org/jira/browse/CASSANDRA-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14935568#comment-14935568
]
T Jake Luciani edited comment on CASSANDRA-10413 at 9/29/15 6:22 PM:
---------------------------------------------------------------------
This looks similar to CASSANDRA-10262
was (Author: tjake):
This looks similar to https://issues.apache.org/jira/browse/CASSANDRA-10413
> Replaying materialized view updates from commitlog after node decommission
> crashes Cassandra
> --------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-10413
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10413
> Project: Cassandra
> Issue Type: Bug
> Reporter: Joel Knighton
> Priority: Critical
> Fix For: 3.0.0 rc2
>
> Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test, runnable as
> {code}
> lein with-profile +trunk test :only
> cassandra.mv-test/mv-crash-subset-decommission
> {code}
> This test crashes/restarts nodes while decommissioning nodes. These actions
> are not coordinated.
> In [10164|https://issues.apache.org/jira/browse/CASSANDRA-10164], we
> introduced a change to re-apply materialized view updates on commitlog replay.
> Some nodes, upon restart, will crash in commitlog replay. They throw the
> "Trying to get the view natural endpoint on a non-data replica" runtime
> exception in getViewNaturalEndpoint. I added logging to
> getViewNaturalEndpoint to show the results of
> replicationStrategy.getNaturalEndpoints for the baseToken and viewToken.
> It can be seen that these problems occur when the baseEndpoints and
> viewEndpoints are identical but do not contain the broadcast address of the
> local node.
> For example, a node at 10.0.0.5 crashes on replay of a write whose base token
> and view token replicas are both [10.0.0.2, 10.0.0.4, 10.0.0.6]. It seems we
> try to guard against this by considering pendingEndpoints for the viewToken,
> but this does not appear to be sufficient.
> I've attached the system.logs for a test run with added logging. In the
> attached logs, n1 is at 10.0.0.2, n2 is at 10.0.0.3, and so on. 10.0.0.6/n5
> is the decommissioned node.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)