[jira] [Comment Edited] (CASSANDRA-18555) A new nodetool/JMX command that tells whether node's decommission failed or not

Stefan Miklosovic (Jira) Wed, 14 Jun 2023 09:19:10 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-18555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17732609#comment-17732609
 ]


Stefan Miklosovic edited comment on CASSANDRA-18555 at 6/14/23 4:18 PM:
------------------------------------------------------------------------

AFAIK that property is there if you want to join a node which was 
decommissioned. I am talking about decommissioning a node which you 
decommissioned and you want to execute decommissioning again - all of this is 
done while Cassandra process is still running.

The trunk's logic is like this:
{code:java}
private void prepareToJoin() throws ConfigurationException
{
    if (!joined)
    {
        Map<ApplicationState, VersionedValue> appStates = new 
EnumMap<>(ApplicationState.class);

        if (SystemKeyspace.wasDecommissioned())
        {
            if (OVERRIDE_DECOMMISSION.getBoolean())
            {
                logger.warn("This node was decommissioned, but overriding by 
operator request.");
                
SystemKeyspace.setBootstrapState(SystemKeyspace.BootstrapState.COMPLETED);
            }
            else
            {
                throw new ConfigurationException("This node was decommissioned 
and will not rejoin the ring unless -D" + OVERRIDE_DECOMMISSION.getKey() +
                                                 "=true has been set, or all 
existing data is removed and the node is bootstrapped again");
            }
        }
 {code}


was (Author: smiklosovic):
AFAIK that property is there if you want to join a node which was 
decommissioned. I am talking about decommissioning a node which you 
decommissioned and you want to execute decommissioning again. 

The trunk's logic is like this:
{code:java}
private void prepareToJoin() throws ConfigurationException
{
    if (!joined)
    {
        Map<ApplicationState, VersionedValue> appStates = new 
EnumMap<>(ApplicationState.class);

        if (SystemKeyspace.wasDecommissioned())
        {
            if (OVERRIDE_DECOMMISSION.getBoolean())
            {
                logger.warn("This node was decommissioned, but overriding by 
operator request.");
                
SystemKeyspace.setBootstrapState(SystemKeyspace.BootstrapState.COMPLETED);
            }
            else
            {
                throw new ConfigurationException("This node was decommissioned 
and will not rejoin the ring unless -D" + OVERRIDE_DECOMMISSION.getKey() +
                                                 "=true has been set, or all 
existing data is removed and the node is bootstrapped again");
            }
        }
 {code}

> A new nodetool/JMX command that tells whether node's decommission failed or 
> not
> -------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-18555
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-18555
>             Project: Cassandra
>          Issue Type: Task
>          Components: Observability/JMX
>            Reporter: Jaydeepkumar Chovatia
>            Assignee: Jaydeepkumar Chovatia
>            Priority: Normal
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Currently, when a node is being decommissioned and if any failure happens, 
> then an exception is thrown back to the caller.
> But Cassandra's decommission takes considerable time ranging from minutes to 
> hours to days. There are various scenarios in that the caller may need to 
> probe the status again:
>  * The caller times out
>  * It is not possible to keep the caller hanging for such a long time
> And If the caller does not know what happened internally, then it cannot 
> retry, etc., leading to other issues.
> So, in this ticket, I am going to add a new nodetool/JMX command that can be 
> invoked by the caller anytime, and it will return the correct status.
> It might look like a smaller change, but when we need to operate Cassandra at 
> scale in a large-scale fleet, then this becomes a bottleneck and require 
> constant operator intervention.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-18555) A new nodetool/JMX command that tells whether node's decommission failed or not

Reply via email to