Arijit created CASSANDRA-13308:
----------------------------------
Summary: Hint files not being deleted on nodetool decommission
Key: CASSANDRA-13308
URL: https://issues.apache.org/jira/browse/CASSANDRA-13308
Project: Cassandra
Issue Type: Bug
Components: Streaming and Messaging
Environment: Using Cassandra version 3.0.9
Reporter: Arijit
Priority: Minor
How to reproduce the issue I'm seeing:
Shut down Cassandra on one node of the cluster and wait until we accumulate a
ton of hints. Start Cassandra on the node and immediately run "nodetool
decommission" on it.
The node streams its replicas and marks itself has DECOMMISSIONED, but other
nodes do not seem to see this message. "nodetool status" shows the
decommissioned node in state "UL", and Cassandra logs show that this is because
gossip tasks on nodes are blocked. Jstack shows that the tasks are blocked on
hints dispatch (I can provide traces if this is not obvious). Because the
cluster is large and there are a lot of hints, this is taking a while.
On inspecting "/var/lib/cassandra/hints" on the nodes, I see a bunch of hint
files for the decommissioned node. Documentation seems to suggest that these
hints should be deleted during "nodetool decommission".
When I manually delete hint files on the nodes, the hints dispatcher threads
throw a bunch of exceptions and the decommissioned node is now in state "DL"
(perhaps it missed some gossip messages?). The node is still in my
"system.peers" table
Restarting Cassandra on all nodes after this step does not fix the issue (the
node remains in the peers table). In fact, after this point the decommissioned
node is in state "DN"
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)