What do you mean by saying "Cassandra stopped responding ... to nodetool requests"? Is it a specific nodetool command (e.g. "nodetool status") or all nodetool commands? What's the issue? Was it an error message, such as connection refused? Or freezes/unresponsive?

It's common to see Cassandra shutdown the gossip and native transport due to disk IO errors or data corruption with the default disk failure policy "stop", but that should not shutdown the JMX port used by nodetool. In "nodetool info" output, it will clearly say both gossip and native transport are not active if this is the case.

I can't help to notice that the data file, commit log, etc. directories are all under the "/data" directory, which makes me want to ask you, is this directory on a shared storage, such as NFS or SAN? If this is the case, a storage failure may lead to multiple nodes stop working.

In addition to the above, are you sure you are looking at the correct log files? The timestamp on the 3 log files you provided don't match. The last line of log in cassandra.log ended on 23 Oct, and the logs in the system.log are between 15:00 and 15:31 on 30 Oct, yet the first line of log in the gc.log started at 15:41. There's no overlapping time window between any of the log files.

On 31/10/2023 23:56, Ben Klein wrote:
On October 30, 2023, at approximately 15:38 UTC, Cassandra stopped responding to TCP pings on port 9042 and to nodetool requests. However, systemd reported that it was still online. The first node to fail was the seed node (192.168.0.44), followed within the next couple minutes by the other two (192.168.0.15 and 192.168.0.20). Looking through the logs on the first node, I did not see anything out of the ordinary. When the service was restarted (through systemd), it came up with no problem, but this is the second time this has happened in the last month.

I have attached all of the log files from the primary node. I have also attached the cassandra.yaml file, which is the same on all three nodes.

What could possibly be causing this? Is there anything else that I should be looking at?

Reply via email to