What do you mean by saying "Cassandra stopped responding ... to nodetool
requests"? Is it a specific nodetool command (e.g. "nodetool status") or
all nodetool commands? What's the issue? Was it an error message, such
as connection refused? Or freezes/unresponsive?
It's common to see Cassandra shutdown the gossip and native transport
due to disk IO errors or data corruption with the default disk failure
policy "stop", but that should not shutdown the JMX port used by
nodetool. In "nodetool info" output, it will clearly say both gossip and
native transport are not active if this is the case.
I can't help to notice that the data file, commit log, etc. directories
are all under the "/data" directory, which makes me want to ask you, is
this directory on a shared storage, such as NFS or SAN? If this is the
case, a storage failure may lead to multiple nodes stop working.
In addition to the above, are you sure you are looking at the correct
log files? The timestamp on the 3 log files you provided don't match.
The last line of log in cassandra.log ended on 23 Oct, and the logs in
the system.log are between 15:00 and 15:31 on 30 Oct, yet the first line
of log in the gc.log started at 15:41. There's no overlapping time
window between any of the log files.
On 31/10/2023 23:56, Ben Klein wrote:
On October 30, 2023, at approximately 15:38 UTC, Cassandra stopped
responding to TCP pings on port 9042 and to nodetool requests.
However, systemd reported that it was still online. The first node to
fail was the seed node (192.168.0.44), followed within the next couple
minutes by the other two (192.168.0.15 and 192.168.0.20). Looking
through the logs on the first node, I did not see anything out of the
ordinary. When the service was restarted (through systemd), it came up
with no problem, but this is the second time this has happened in the
last month.
I have attached all of the log files from the primary node. I have
also attached the cassandra.yaml file, which is the same on all three
nodes.
What could possibly be causing this? Is there anything else that I
should be looking at?