[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

David Capwell (Jira) Mon, 02 Nov 2020 18:53:48 -0800


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17225093#comment-17225093
 ]


David Capwell commented on CASSANDRA-16146:
-------------------------------------------

Turns out this broke jvm-dtest 
org.apache.cassandra.distributed.test.ClientNetworkStopStartTest

{code}
[junit-timeout] Testcase: 
stopStartNative(org.apache.cassandra.distributed.test.ClientNetworkStopStartTest):
    FAILED
[junit-timeout] nodetool command [enablebinary] was not successful
[junit-timeout] Notifications:
[junit-timeout] Error:
[junit-timeout] java.lang.IllegalStateException: Unable to start native 
transport because the node is not in the normal state.
[junit-timeout]         at 
org.apache.cassandra.service.StorageService.checkServiceAllowedToStart(StorageService.java:4389)
[junit-timeout]         at 
org.apache.cassandra.service.StorageService.startNativeTransport(StorageService.java:427)
[junit-timeout]         at 
org.apache.cassandra.tools.NodeProbe.startNativeTransport(NodeProbe.java:963)
[junit-timeout]         at 
org.apache.cassandra.tools.nodetool.EnableBinary.execute(EnableBinary.java:31)
[junit-timeout]         at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.runInternal(NodeTool.java:287)
[junit-timeout]         at 
org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:272)
[junit-timeout]         at 
org.apache.cassandra.tools.NodeTool.execute(NodeTool.java:178)
[junit-timeout]         at 
org.apache.cassandra.distributed.impl.Instance$DTestNodeTool.execute(Instance.java:843)
[junit-timeout]         at 
org.apache.cassandra.distributed.impl.Instance.lambda$nodetoolResult$30(Instance.java:778)
[junit-timeout]         at 
java.util.concurrent.FutureTask.run(FutureTask.java:266)
[junit-timeout]         at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[junit-timeout]         at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[junit-timeout]         at 
org.apache.cassandra.concurrent.NamedThreadFactory.lambda$threadLocalDeallocator$0(NamedThreadFactory.java:83)
[junit-timeout]         at java.lang.Thread.run(Thread.java:748)
{code}

found in 
https://app.circleci.com/pipelines/github/dcapwell/cassandra/752/workflows/cf4c3766-de15-4903-88f9-20a3dbcfd331/jobs/4268

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-16146
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Cluster/Gossip
>            Reporter: Yifan Cai
>            Assignee: Yifan Cai
>            Priority: Normal
>             Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

Reply via email to