[
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148332#comment-15148332
]
Stefania commented on CASSANDRA-10371:
--------------------------------------
Ideally we would need the logs at TRACE level. I'm afraid the lines above don't
really help much.
You can try these commands to avoid restarting:
{code}
nodetool setlogginglevel org.apache.cassandra.gms.Gossiper TRACE
nodetool setlogginglevel org.apache.cassandra.gms.GossipShutdownVerbHandler
TRACE
nodetool setlogginglevel org.apache.cassandra.gms.GossipDigestAckVerbHandler
TRACE
nodetool setlogginglevel org.apache.cassandra.gms.GossipDigestAck2VerbHandler
TRACE
nodetool setlogginglevel org.apache.cassandra.gms.FailureDetector TRACE
nodetool setlogginglevel org.apache.cassandra.service.StorageService TRACE
{code}
to reset:
{code}
nodetool setlogginglevel
{code}
Use {{-h}} to specify a host if required. Obviously if this is a production
cluster you may want to wait.
If you see a digest message for the node causing problems, it may arrive from a
single host, see [this
comment|https://issues.apache.org/jira/browse/CASSANDRA-10371?focusedCommentId=15068186&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15068186]
above for a technique to find such host. Then, before draining and restarting
the host causing the problem, we would need the logs at TRACE level for this
host.
> Decommissioned nodes can remain in gossip
> -----------------------------------------
>
> Key: CASSANDRA-10371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
> Project: Cassandra
> Issue Type: Bug
> Components: Distributed Metadata
> Reporter: Brandon Williams
> Assignee: Stefania
> Priority: Minor
>
> This may apply to other dead states as well. Dead states should be expired
> after 3 days. In the case of decom we attach a timestamp to let the other
> nodes know when it should be expired. It has been observed that sometimes a
> subset of nodes in the cluster never expire the state, and through heap
> analysis of these nodes it is revealed that the epstate.isAlive check returns
> true when it should return false, which would allow the state to be evicted.
> This may have been affected by CASSANDRA-8336.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)