[jira] [Commented] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

Joel Knighton (JIRA) Thu, 18 Feb 2016 09:38:34 -0800

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-10371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15152695#comment-15152695
 ]


Joel Knighton commented on CASSANDRA-10371:
-------------------------------------------

Thanks - those logs confirm my suspicion that 10.0.2.128 is propagating the 
EndpointState through the cluster and not evicting it. One more piece of 
information will allow me to root-cause this and suggest a fix.

If you connect to 10.0.2.128 over JMX, on 
org.apache.cassandra.net.FailureDetector, there should be an operation 
dumpInterArrivalTimes(). Invoking that operation over JMX will create a file in 
the Java temporary directory (likely just "/tmp") called "failuredetector-{SOME 
NUMBERS}.dat". If you could attach that file to this ticket, I can diagnose the 
issue further. There is no sensitive information in that file; it will just 
contain the samples of gossip arrival time for nodes in the cluster.

Thanks again; your help in working with a running cluster with this issue is 
tremendously helpful.

> Decommissioned nodes can remain in gossip
> -----------------------------------------
>
>                 Key: CASSANDRA-10371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Distributed Metadata
>            Reporter: Brandon Williams
>            Assignee: Joel Knighton
>            Priority: Minor
>
> This may apply to other dead states as well.  Dead states should be expired 
> after 3 days.  In the case of decom we attach a timestamp to let the other 
> nodes know when it should be expired.  It has been observed that sometimes a 
> subset of nodes in the cluster never expire the state, and through heap 
> analysis of these nodes it is revealed that the epstate.isAlive check returns 
> true when it should return false, which would allow the state to be evicted.  
> This may have been affected by CASSANDRA-8336.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-10371) Decommissioned nodes can remain in gossip

Reply via email to