Some more observations. If the first drive fails on a node, then you
can't just remove it from the list. Example:
We have:
/data/1/cassandra
/data/2/cassandra
/data/3/cassandra
/data/4/cassandra
...
If /data/1 fails, and I remove it from the list, when you try to start
cassandra on that node
Thanks for your help guys.
You were right, the problem actually came from a very heavy data treatment
that happens every 2 hours starting at midnight. The processing performance
was heavily affected causing one node to write hints because communication
with the other node was complicated.
Best
I had a drive fail (first drive in the list) on a Cassandra cluster.
I've stopped the node (as it no longer starts), and am trying to remove
it from the cluster, but the removenode command is hung (been running
for 3 hours so far):
nodetool removenode status is always reporting the same token
Those hosts are likely sending streams.
If you do `nodetool netstats` on the replicas of the node you're removing,
you should see byte counters and file counters - they should all be
incrementing. If one of them isnt incremening, that one is probably stuck.
There's at least one bug in 4.1 that
Thank you - I was just impatient. :)
-Joe
On 1/23/2023 12:56 PM, Jeff Jirsa wrote:
Those hosts are likely sending streams.
If you do `nodetool netstats` on the replicas of the node you're
removing, you should see byte counters and file counters - they should
all be incrementing. If one of