[ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15148332#comment-15148332
 ] 

Stefania commented on CASSANDRA-10371:
--------------------------------------

Ideally we would need the logs at TRACE level. I'm afraid the lines above don't 
really help much.

You can try these commands to avoid restarting:

{code}
nodetool setlogginglevel org.apache.cassandra.gms.Gossiper TRACE
nodetool setlogginglevel org.apache.cassandra.gms.GossipShutdownVerbHandler 
TRACE
nodetool setlogginglevel org.apache.cassandra.gms.GossipDigestAckVerbHandler 
TRACE
nodetool setlogginglevel org.apache.cassandra.gms.GossipDigestAck2VerbHandler 
TRACE
nodetool setlogginglevel org.apache.cassandra.gms.FailureDetector TRACE
nodetool setlogginglevel org.apache.cassandra.service.StorageService TRACE
{code}

to reset:

{code}
nodetool setlogginglevel
{code}

Use {{-h}} to specify a host if required. Obviously if this is a production 
cluster you may want to wait.

If you see a digest message for the node causing problems, it may arrive from a 
single host, see [this 
comment|https://issues.apache.org/jira/browse/CASSANDRA-10371?focusedCommentId=15068186&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15068186]
 above for a technique to find such host. Then, before draining and restarting 
the host causing the problem, we would need the logs at TRACE level for this 
host.

> Decommissioned nodes can remain in gossip
> -----------------------------------------
>
>                 Key: CASSANDRA-10371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Brandon Williams
>            Assignee: Stefania
>            Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired 
> after 3 days.  In the case of decom we attach a timestamp to let the other 
> nodes know when it should be expired.  It has been observed that sometimes a 
> subset of nodes in the cluster never expire the state, and through heap 
> analysis of these nodes it is revealed that the epstate.isAlive check returns 
> true when it should return false, which would allow the state to be evicted.  
> This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to