[
https://issues.apache.org/jira/browse/CASSANDRA-12008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15402388#comment-15402388
]
Paulo Motta commented on CASSANDRA-12008:
-----------------------------------------
Thanks for the update. This is looking better and we're nearly done, see follow
up below:
* Code
** Fix indentation of {{logger.debug("DECOMMISSIONING")}}
** The {{isDecommissioning.get()}} should use a {{compareAndSet}} to avoid
starting simultaneous decommision sessions. See the {{isRebuilding}} check.
Also, add a test to verify it's not possible to start multiple decommission
simultaneously based on the solution on CASSANDRA-11687 to avoid test flakiness.
** on {{SessionCompleteEvent}} use {{Collections.unmodifiableMap}} when copying
the {{transferredRangesPerKeyspace}} map to avoid modifications to the ma
** In order to avoid allocating a {{HashSet}} when it's not necessary, change
this {noformat}
Set<Range<Token>> toBeUpdated = new HashSet<>();
if (transferredRangesPerKeyspace.containsKey(keyspace))
{
toBeUpdated = transferredRangesPerKeyspace.get(keyspace);
}
{noformat} with this: {noformat}
Set<Range<Token>> toBeUpdated =
transferredRangesPerKeyspace.get(keyspace)
if (toBeUpdated == null)
{
toBeUpdated = new HashSet<>();
}
{noformat}
** {{Error while decommissioning node}} is never printed because the
{{ExecutionException}} is being wrapped in a {{RuntimeException}} on
{{unbootstrap}}, so perhaps you can modify {{unbootstrap}} to throw
{{ExecutionException | InterruptedException}} and catch that on {{decomission}}
to wrap in {{RuntimeException}}.
* dtests
** Simply running {{stress read}} will not fail if the keys are not there, you
need to either compare the retrieved keys or check that there was no failure on
the stress process (see {{bootstrap_test}} for examples).
** When verifying if the retrieved data is correct on
{{resumable_decommission_test}}, you need to stop either node1 or node3 when
querying the other otherwise the data may be in only one of these nodes (while
it must be in both nodes, since RF=2 and N=2).
** Perhaps reduce the number of keys to 10k so the test will be faster.
** On {{resumable_decommission_test}} set
{{stream_throughput_outbound_megabits_per_sec}} to {{1}} to the streaming will
be slower and allow more time for interrupting.
** Perhaps it's better for {{InterruptDecommission}} to watch on {{rebuild from
dc}} since this is print before {{"Executing streaming plan for Unbootstrap"}}
** Instead of counting for {{decommission_error}} you can add a
{{self.fail("second rebuild should fail")}} after
{{node2.nodetool('decommission')}} and on the {{except}} part perhaps check
that the following message is being print on logs {{Error while decommissioning
node}} - see new version of {{simple_rebuild_test}} from CASSANDRA-11687.
** bq. I found that streamed range skipping behaviour log check-up is not
working
*** This is probably because the {{Range
(-2556370087840976503,-2548250017122308073] already in /127.0.0.3, skipping}}
message is only being print on {{debug.log}} so you should pass a
{{filename='debug.log'}} to {{watch_log_for}}.
When you modify {{StreamStateStore}} to {{updateStreamedRanges}} for requested
ranges (ie. bootstrap), there could be a collision between received and
transferred ranges for the same peer. While this collision will not show up in
decommission, bootstrap and rebuild, since we only transfer in one direction,
this may be confusing and source of problems in the future, so in order to
avoid creating another table to support that in the future, I think we can
modify {{streamed_ranges}} to include an {{outgoing}} boolean primary key field
indicating if it's an incoming or outgoing transfer. WDYT [~yukim] [~kdmu]?
> Make decommission operations resumable
> --------------------------------------
>
> Key: CASSANDRA-12008
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12008
> Project: Cassandra
> Issue Type: Improvement
> Components: Streaming and Messaging
> Reporter: Tom van der Woerdt
> Assignee: Kaide Mu
> Priority: Minor
>
> We're dealing with large data sets (multiple terabytes per node) and
> sometimes we need to add or remove nodes. These operations are very dependent
> on the entire cluster being up, so while we're joining a new node (which
> sometimes takes 6 hours or longer) a lot can go wrong and in a lot of cases
> something does.
> It would be great if the ability to retry streams was implemented.
> Example to illustrate the problem :
> {code}
> 03:18 PM ~ $ nodetool decommission
> error: Stream failed
> -- StackTrace --
> org.apache.cassandra.streaming.StreamException: Stream failed
> at
> org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
> at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
> at
> com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
> at
> com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
> at
> com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
> at
> com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
> at
> org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:210)
> at
> org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:186)
> at
> org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:430)
> at
> org.apache.cassandra.streaming.StreamSession.complete(StreamSession.java:622)
> at
> org.apache.cassandra.streaming.StreamSession.messageReceived(StreamSession.java:486)
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:274)
> at java.lang.Thread.run(Thread.java:745)
> 08:04 PM ~ $ nodetool decommission
> nodetool: Unsupported operation: Node in LEAVING state; wait for status to
> become normal or restart
> See 'nodetool help' or 'nodetool help <command>'.
> {code}
> Streaming failed, probably due to load :
> {code}
> ERROR [STREAM-IN-/<ipaddr>] 2016-06-14 18:05:47,275 StreamSession.java:520 -
> [Stream #<streamid>] Streaming error occurred
> java.net.SocketTimeoutException: null
> at
> sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:211)
> ~[na:1.8.0_77]
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
> ~[na:1.8.0_77]
> at
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
> ~[na:1.8.0_77]
> at
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:268)
> ~[apache-cassandra-3.0.6.jar:3.0.6]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_77]
> {code}
> If implementing retries is not possible, can we have a 'nodetool decommission
> resume'?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)