[ 
https://issues.apache.org/jira/browse/CASSANDRA-11842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Jorgensen updated CASSANDRA-11842:
-----------------------------------------
    Description: 
Today I noticed that 2 nodes in a 54 node cluster have been using up disk space 
at a constant rate for the last 3 days or so. 

This is a graph of disk space over the last 4 days for each of the nodes in our 
cassandra cluster.
!disks-space.png|thumnnail!

When I looked into it I found that the majority of the disk space was being 
used up in /mnt/cassandra/commitlog. It looked like there were files dating 
back to when the disk usage started to increase on 5/16 and there were a total 
of ~13K commit log files in this directory.

I was curious if anyone has seen this before. I am not sure what would cause 
this behavior, especially on two separate nodes in the cluster at about the 
same time. I think this points to something about the data, we have a 
replication factor of 2 which seems to match up with the number of nodes that 
were affected.

The two nodes in question looked down from every other node in the clusters 
perspective when doing `nodetool` status but when running that on the affected 
nodes the entire cluster looked  like it was up and running.

To remedy the situation I tried running `nodetool drain` on one of the affected 
nodes but it seemed to be hung and I couldnt get a handle on if it was doing 
anything or not. I restarted the cassandra process and could see in the debug 
log that it was reading in the commit log files. On the second node I moved the 
commit log folder to a different location and restarted the node which cause it 
to immediately rejoin the cluster and I can go re-play the commit log files 
that were queued up later to make sure its in a consistent state. So far it 
looks like the commit log file growth on that node is not growing unboundedly.

As far as I could tell the data in /mnt/cassandra/data/ for each of the 
keyspaces and tables had recent timestamps on the file which I believe means 
that flushing was happening and data was getting written to the SStables, also 
350GB of commitlog wouldnt have been able to fit into memory.

If there is any other information I can provide please let me know. I didnt see 
much in the cassandra system.log or debug.log file but would be happy to 
provide them if it'll help.

  was:
Today I noticed that 2 nodes in a 54 node cluster have been using up disk space 
at a constant rate for the last 3 days or so. 

!disks-space.png|thumnnail!

When I looked into it I found that the majority of the disk space was being 
used up in /mnt/cassandra/commitlog. It looked like there were files dating 
back to when the disk usage started to increase on 5/16 and there were a total 
of ~13K commit log files in this directory.

I was curious if anyone has seen this before. I am not sure what would cause 
this behavior, especially on two separate nodes in the cluster at about the 
same time. I think this points to something about the data, we have a 
replication factor of 2 which seems to match up with the number of nodes that 
were affected.

The two nodes in question looked down from every other node in the clusters 
perspective when doing `nodetool` status but when running that on the affected 
nodes the entire cluster looked  like it was up and running.

To remedy the situation I tried running `nodetool drain` on one of the affected 
nodes but it seemed to be hung and I couldnt get a handle on if it was doing 
anything or not. I restarted the cassandra process and could see in the debug 
log that it was reading in the commit log files. On the second node I moved the 
commit log folder to a different location and restarted the node which cause it 
to immediately rejoin the cluster and I can go re-play the commit log files 
that were queued up later to make sure its in a consistent state. So far it 
looks like the commit log file growth on that node is not growing unboundedly.

As far as I could tell the data in /mnt/cassandra/data/ for each of the 
keyspaces and tables had recent timestamps on the file which I believe means 
that flushing was happening and data was getting written to the SStables, also 
350GB of commitlog wouldnt have been able to fit into memory.

If there is any other information I can provide please let me know. I didnt see 
much in the cassandra system.log or debug.log file but would be happy to 
provide them if it'll help.


> Unbounded commit log file growth
> --------------------------------
>
>                 Key: CASSANDRA-11842
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-11842
>             Project: Cassandra
>          Issue Type: Bug
>         Environment: Cassandra version 3.0.3 on Ubuntu Trusty
>            Reporter: Andrew Jorgensen
>         Attachments: disks-space.png
>
>
> Today I noticed that 2 nodes in a 54 node cluster have been using up disk 
> space at a constant rate for the last 3 days or so. 
> This is a graph of disk space over the last 4 days for each of the nodes in 
> our cassandra cluster.
> !disks-space.png|thumnnail!
> When I looked into it I found that the majority of the disk space was being 
> used up in /mnt/cassandra/commitlog. It looked like there were files dating 
> back to when the disk usage started to increase on 5/16 and there were a 
> total of ~13K commit log files in this directory.
> I was curious if anyone has seen this before. I am not sure what would cause 
> this behavior, especially on two separate nodes in the cluster at about the 
> same time. I think this points to something about the data, we have a 
> replication factor of 2 which seems to match up with the number of nodes that 
> were affected.
> The two nodes in question looked down from every other node in the clusters 
> perspective when doing `nodetool` status but when running that on the 
> affected nodes the entire cluster looked  like it was up and running.
> To remedy the situation I tried running `nodetool drain` on one of the 
> affected nodes but it seemed to be hung and I couldnt get a handle on if it 
> was doing anything or not. I restarted the cassandra process and could see in 
> the debug log that it was reading in the commit log files. On the second node 
> I moved the commit log folder to a different location and restarted the node 
> which cause it to immediately rejoin the cluster and I can go re-play the 
> commit log files that were queued up later to make sure its in a consistent 
> state. So far it looks like the commit log file growth on that node is not 
> growing unboundedly.
> As far as I could tell the data in /mnt/cassandra/data/ for each of the 
> keyspaces and tables had recent timestamps on the file which I believe means 
> that flushing was happening and data was getting written to the SStables, 
> also 350GB of commitlog wouldnt have been able to fit into memory.
> If there is any other information I can provide please let me know. I didnt 
> see much in the cassandra system.log or debug.log file but would be happy to 
> provide them if it'll help.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to