[jira] [Assigned] (CASSANDRA-13308) Gossip breaks, Hint files not being deleted on nodetool decommission

Jeff Jirsa (JIRA) Thu, 16 Mar 2017 14:48:12 -0700

     [ 
https://issues.apache.org/jira/browse/CASSANDRA-13308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jeff Jirsa reassigned CASSANDRA-13308:
--------------------------------------

    Assignee: Jeff Jirsa

> Gossip breaks, Hint files not being deleted on nodetool decommission
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-13308
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13308
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Streaming and Messaging
>         Environment: Using Cassandra version 3.0.9
>            Reporter: Arijit
>            Assignee: Jeff Jirsa
>         Attachments: 28207.stack, logs, logs_decommissioned_node
>
>
> How to reproduce the issue I'm seeing:
> Shut down Cassandra on one node of the cluster and wait until we accumulate a 
> ton of hints. Start Cassandra on the node and immediately run "nodetool 
> decommission" on it.
> The node streams its replicas and marks itself as DECOMMISSIONED, but other 
> nodes do not seem to see this message. "nodetool status" shows the 
> decommissioned node in state "UL" on all other nodes (it is also present in 
> system.peers), and Cassandra logs show that gossip tasks on nodes are not 
> proceeding (number of pending tasks keeps increasing). Jstack suggests that a 
> gossip task is blocked on hints dispatch (I can provide traces if this is not 
> obvious). Because the cluster is large and there are a lot of hints, this is 
> taking a while. 
> On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint 
> files for the decommissioned node. Documentation seems to suggest that these 
> hints should be deleted during "nodetool decommission", but it does not seem 
> to be the case here. This is the bug being reported.
> To recover from this scenario, if I manually delete hint files on the nodes, 
> the hints dispatcher threads throw a bunch of exceptions and the 
> decommissioned node is now in state "DL" (perhaps it missed some gossip 
> messages?). The node is still in my "system.peers" table
> Restarting Cassandra on all nodes after this step does not fix the issue (the 
> node remains in the peers table). In fact, after this point the 
> decommissioned node is in state "DN"



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Assigned] (CASSANDRA-13308) Gossip breaks, Hint files not being deleted on nodetool decommission

Reply via email to