Re: Failed disks - correct procedure

2023-01-23 Thread Joe Obernberger
Some more observations.  If the first drive fails on a node, then you can't just remove it from the list.  Example: We have: /data/1/cassandra /data/2/cassandra /data/3/cassandra /data/4/cassandra ... If /data/1 fails, and I remove it from the list, when you try to start cassandra on that node

Re: Cassandra nightly process

2023-01-23 Thread Loïc CHANEL via user
Thanks for your help guys. You were right, the problem actually came from a very heavy data treatment that happens every 2 hours starting at midnight. The processing performance was heavily affected causing one node to write hints because communication with the other node was complicated. Best

removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger
I had a drive fail (first drive in the list) on a Cassandra cluster.  I've stopped the node (as it no longer starts), and am trying to remove it from the cluster, but the removenode command is hung (been running for 3 hours so far): nodetool removenode status is always reporting the same token

Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Jeff Jirsa
Those hosts are likely sending streams. If you do `nodetool netstats` on the replicas of the node you're removing, you should see byte counters and file counters - they should all be incrementing. If one of them isnt incremening, that one is probably stuck. There's at least one bug in 4.1 that

Re: removenode stuck - cassandra 4.1.0

2023-01-23 Thread Joe Obernberger
Thank you - I was just impatient.  :) -Joe On 1/23/2023 12:56 PM, Jeff Jirsa wrote: Those hosts are likely sending streams. If you do `nodetool netstats` on the replicas of the node you're removing, you should see byte counters and file counters - they should all be incrementing. If one of