[
https://issues.apache.org/jira/browse/CASSANDRA-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732552#comment-17732552
]
Stefan Miklosovic commented on CASSANDRA-18555:
-----------------------------------------------
Aha ... well, to put my way of thinking under scrutiny, so let's imagine that a
decommission fails, we kill the node and we start it again (that scenario
itself is quite improbable but anyway). My point is that "this is dangerous so
we need to save the state". OK, so we save the state, we see that the previous
decommission has failed and now what? Like ... what are we going to do about
that? What other possible course of action we could take when we see a node has
failed to decommission but to try to decommission it again? So the fact that it
failed to decommission _and to persist this state until a possible restart_ is
kind of useless.
If decommission means to be repeatable if it fails in the middle, as you
suggested, that knowing this across restarts is not helpful.
Whole decommissioning logic is basically about two methods in StorageService:
startLeaving() and unbootstrap().
startLeaving just gossips that status will be LEAVING so other nodes know this.
unbootstrap is repairing some paxos topology, starts batchlog replay, hints
replay and it streams data to other nodes, all of which seems to be repeatable
without issues.
I do not see any dtest which would test failed decommission so we would see it
is indeed repeatable operation.
I check what it would take to gossip unsuccessful decommission operation. I
dont have a clue how complex that would be but my gut feeling is that it wont
be so easy. Let's see.
> A new nodetool/JMX command that tells whether node's decommission failed or
> not
> -------------------------------------------------------------------------------
>
> Key: CASSANDRA-18555
> URL: https://issues.apache.org/jira/browse/CASSANDRA-18555
> Project: Cassandra
> Issue Type: Task
> Components: Observability/JMX
> Reporter: Jaydeepkumar Chovatia
> Assignee: Jaydeepkumar Chovatia
> Priority: Normal
> Time Spent: 3h 40m
> Remaining Estimate: 0h
>
> Currently, when a node is being decommissioned and if any failure happens,
> then an exception is thrown back to the caller.
> But Cassandra's decommission takes considerable time ranging from minutes to
> hours to days. There are various scenarios in that the caller may need to
> probe the status again:
> * The caller times out
> * It is not possible to keep the caller hanging for such a long time
> And If the caller does not know what happened internally, then it cannot
> retry, etc., leading to other issues.
> So, in this ticket, I am going to add a new nodetool/JMX command that can be
> invoked by the caller anytime, and it will return the correct status.
> It might look like a smaller change, but when we need to operate Cassandra at
> scale in a large-scale fleet, then this becomes a bottleneck and require
> constant operator intervention.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]