[jira] [Commented] (CASSANDRA-10674) Materialized View SSTable streaming/leaving status race on decommission

Paulo Motta (JIRA) Wed, 25 Nov 2015 15:28:19 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15027795#comment-15027795
 ]


Paulo Motta commented on CASSANDRA-10674:
-----------------------------------------

I agree with [~tjake] that the simplest thing to do here is to force the 
mutation into the local batchlog when the node is not a base replica of the 
mutation, and log a warning if there are no pending ranges (since they might be 
being calculated or still haven't propagated fully by gossip). I implemented a 
patch based on this approach:

||3.0||trunk||
|[branch|https://github.com/apache/cassandra/compare/cassandra-3.0...pauloricardomg:3.0-10674]|[branch|https://github.com/apache/cassandra/compare/trunk...pauloricardomg:trunk-10674]|
|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10674-testall/lastCompletedBuild/testReport/]|[testall|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10674-testall/lastCompletedBuild/testReport/]|
|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-3.0-10674-dtest/lastCompletedBuild/testReport/]|[dtest|http://cassci.datastax.com/view/Dev/view/paulomotta/job/pauloricardomg-trunk-10674-dtest/lastCompletedBuild/testReport/]|

[~jkni] could you verify the jepsen tests with this approach and check if the 
warning is being printed?

bq. Second and more importantly we should probably add an acknowledgement to 
the streaming operation that it was processed by the receiver correctly. 

It seems the stream receive task (and thus the stream sesssion) is only 
completed on 
[2.1|https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L175]
 and 
[2.2|https://github.com/apache/cassandra/blob/cassandra-2.2/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L171]
 after the files are processed (otherwise it just hangs), but on 
[3.0|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java#L231]
 it's always completed even if there was a failure, what seems more critical. 
In any case, we should probably fail the stream session if there is a problem 
while processing the received data. I created CASSANDRA-10774 to investigate 
and address that.

> Materialized View SSTable streaming/leaving status race on decommission
> -----------------------------------------------------------------------
>
>                 Key: CASSANDRA-10674
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10674
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Coordination, Distributed Metadata
>            Reporter: Joel Knighton
>            Assignee: Paulo Motta
>             Fix For: 3.0.1, 3.1
>
>         Attachments: leaving-node-debug.log, receiving-node-debug.log
>
>
> On decommission of a node in a cluster with materialized views, it is 
> possible for the decommissioning node to begin streaming sstables for an MV 
> base table before the receiving node is aware of the leaving status.
> The materialized view base/view replica pairing checks pending endpoints to 
> handle the case when an sstable is received from a leaving node; without the 
> leaving message, this check breaks and an exception is thrown. The streamed 
> sstable is never applied.
> Logs from a decommissioning node and a node receiving such a stream are 
> attached.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10674) Materialized View SSTable streaming/leaving status race on decommission

Reply via email to