Andrew Jorgensen created CASSANDRA-11842:
--------------------------------------------

             Summary: Unbounded commit log file growth
                 Key: CASSANDRA-11842
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11842
             Project: Cassandra
          Issue Type: Bug
         Environment: Cassandra version 3.0.3 on Ubuntu Trusty
            Reporter: Andrew Jorgensen
         Attachments: disks-space.png

Today I noticed that 2 nodes in a 54 node cluster have been using up disk space 
at a constant rate for the last 3 days or so. 

!disks-space.png|thumnnail!

When I looked into it I found that the majority of the disk space was being 
used up in /mnt/cassandra/commitlog. It looked like there were files dating 
back to when the disk usage started to increase on 5/16 and there were a total 
of ~13K commit log files in this directory.

I was curious if anyone has seen this before. I am not sure what would cause 
this behavior, especially on two separate nodes in the cluster at about the 
same time. I think this points to something about the data, we have a 
replication factor of 2 which seems to match up with the number of nodes that 
were affected.

The two nodes in question looked down from every other node in the clusters 
perspective when doing `nodetool` status but when running that on the affected 
nodes the entire cluster looked  like it was up and running.

To remedy the situation I tried running `nodetool drain` on one of the affected 
nodes but it seemed to be hung and I couldnt get a handle on if it was doing 
anything or not. I restarted the cassandra process and could see in the debug 
log that it was reading in the commit log files. On the second node I moved the 
commit log folder to a different location and restarted the node which cause it 
to immediately rejoin the cluster and I can go re-play the commit log files 
that were queued up later to make sure its in a consistent state. So far it 
looks like the commit log file growth on that node is not growing unboundedly.

As far as I could tell the data in /mnt/cassandra/data/ for each of the 
keyspaces and tables had recent timestamps on the file which I believe means 
that flushing was happening and data was getting written to the SStables, also 
350GB of commitlog wouldnt have been able to fit into memory.

If there is any other information I can provide please let me know. I didnt see 
much in the cassandra system.log or debug.log file but would be happy to 
provide them if it'll help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to