[ 
https://issues.apache.org/jira/browse/CASSANDRA-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15389489#comment-15389489
 ] 

Paulo Motta commented on CASSANDRA-12008:
-----------------------------------------

Overall your approach looks good. thanks. See comments below:

bq. Node state is changed to LEAVING after decommission starts, and current 
source code prevents all states different from NORMAL to restart a decommission 
operation.

We can change that to allow starting a decommission operation only if state is 
{{NORMAL}} or {{LEAVING}}, and in addition to that have an 
{{isDecommissioning}} flag (similar to {{isRebuilding}} to prevent starting a 
deccommision while one is still running. We should unset that flag when the 
decommission finishes or fails.

bq. So maybe we should adapt StorageService.streamRanges to use RangeStreamer 
that already has all implemented.

I agree we can probably reuse the {{available_ranges}} table to store already 
transferred ranges during decommission, so let's start with that then since 
it's simpler. But we will probably need to truncate the table in the beginning 
of deccommision to cleanup previous state from bootstrap/rebuild. we will also 
need to update the table description to say that it's also used by decommission.

bq. 2. and 3. may not be necessary because StreamStateStore.handleStreamEvent 
always updates SystemKeyspace.availableRanges using 
SystemKeyspace.updateAvailableRanges no matter if the session transferred or 
received ranges

The problem is that {{RangeStreamer}} is used to receive ranges, while 
{{StorageService.streamRanges}} is used to send ranges, so I think adapting 
{{RangeStreamer}} to send ranges will be non-trivial. So it will probably 
easier to adapt {{StorageService.streamRanges}} to register the 
{{StreamStateStore}} as a {{StreamPlan}} listener so it will automatically 
store already transferred ranges during unbootstrap.

> Make decommission operations resumable
> --------------------------------------
>
>                 Key: CASSANDRA-12008
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12008
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Streaming and Messaging
>            Reporter: Tom van der Woerdt
>            Assignee: Kaide Mu
>            Priority: Minor
>
> We're dealing with large data sets (multiple terabytes per node) and 
> sometimes we need to add or remove nodes. These operations are very dependent 
> on the entire cluster being up, so while we're joining a new node (which 
> sometimes takes 6 hours or longer) a lot can go wrong and in a lot of cases 
> something does.
> It would be great if the ability to retry streams was implemented.
> Example to illustrate the problem :
> {code}
> 03:18 PM   ~ $ nodetool decommission
> error: Stream failed
> -- StackTrace --
> org.apache.cassandra.streaming.StreamException: Stream failed
>         at 
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
>         at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
>         at 
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
>         at 
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
>         at 
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
>         at 
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
>         at 
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
>         at 
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
>         at 
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
>         at 
> org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:622)
>         at 
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:486)
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274)
>         at java.lang.Thread.run(Thread.java:745)
> 08:04 PM   ~ $ nodetool decommission
> nodetool: Unsupported operation: Node in LEAVING state; wait for status to 
> become normal or restart
> See 'nodetool help' or 'nodetool help <command>'.
> {code}
> Streaming failed, probably due to load :
> {code}
> ERROR [STREAM-IN-/<ipaddr>] 2016-06-14 18:05:47,275 StreamSession.java:520 - 
> [Stream #<streamid>] Streaming error occurred
> java.net.SocketTimeoutException: null
>         at 
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211) 
> ~[na:1.8.0_77]
>         at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) 
> ~[na:1.8.0_77]
>         at 
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) 
> ~[na:1.8.0_77]
>         at 
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>         at 
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> {code}
> If implementing retries is not possible, can we have a 'nodetool decommission 
> resume'?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to