Andrew Jorgensen created CASSANDRA-11842:
--------------------------------------------
Summary: Unbounded commit log file growth
Key: CASSANDRA-11842
URL: https://issues.apache.org/jira/browse/CASSANDRA-11842
Project: Cassandra
Issue Type: Bug
Environment: Cassandra version 3.0.3 on Ubuntu Trusty
Reporter: Andrew Jorgensen
Attachments: disks-space.png
Today I noticed that 2 nodes in a 54 node cluster have been using up disk space
at a constant rate for the last 3 days or so.
!disks-space.png|thumnnail!
When I looked into it I found that the majority of the disk space was being
used up in /mnt/cassandra/commitlog. It looked like there were files dating
back to when the disk usage started to increase on 5/16 and there were a total
of ~13K commit log files in this directory.
I was curious if anyone has seen this before. I am not sure what would cause
this behavior, especially on two separate nodes in the cluster at about the
same time. I think this points to something about the data, we have a
replication factor of 2 which seems to match up with the number of nodes that
were affected.
The two nodes in question looked down from every other node in the clusters
perspective when doing `nodetool` status but when running that on the affected
nodes the entire cluster looked like it was up and running.
To remedy the situation I tried running `nodetool drain` on one of the affected
nodes but it seemed to be hung and I couldnt get a handle on if it was doing
anything or not. I restarted the cassandra process and could see in the debug
log that it was reading in the commit log files. On the second node I moved the
commit log folder to a different location and restarted the node which cause it
to immediately rejoin the cluster and I can go re-play the commit log files
that were queued up later to make sure its in a consistent state. So far it
looks like the commit log file growth on that node is not growing unboundedly.
As far as I could tell the data in /mnt/cassandra/data/ for each of the
keyspaces and tables had recent timestamps on the file which I believe means
that flushing was happening and data was getting written to the SStables, also
350GB of commitlog wouldnt have been able to fit into memory.
If there is any other information I can provide please let me know. I didnt see
much in the cassandra system.log or debug.log file but would be happy to
provide them if it'll help.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)