[
https://issues.apache.org/jira/browse/CASSANDRA-5605?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13770869#comment-13770869
]
Jonathan Ellis commented on CASSANDRA-5605:
-------------------------------------------
bq. we still need to add error handler to FlushRunnable so postExecutor does
not get blocked
I'm not sure we want to unblock it -- if the flush errors out, then we
definitely don't want commitlog segments getting cleaned up. What did you have
in mind?
> Crash caused by insufficient disk space to flush
> ------------------------------------------------
>
> Key: CASSANDRA-5605
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5605
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.2.5
> Environment: java version "1.7.0_15"
> Reporter: Dan Hendry
> Assignee: Jonathan Ellis
> Priority: Minor
> Fix For: 1.2.10, 2.0.1
>
>
> A few times now I have seen our Cassandra nodes crash by running themselves
> out of memory. It starts with the following exception:
> {noformat}
> ERROR [FlushWriter:13000] 2013-05-31 11:32:02,350 CassandraDaemon.java (line
> 164) Exception in thread Thread[FlushWriter:13000,5,main]
> java.lang.RuntimeException: Insufficient disk space to write 8042730 bytes
> at
> org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:42)
> at
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:722)
> {noformat}
> After which, it seems the MemtablePostFlusher stage gets stuck and no further
> memtables get flushed:
> {noformat}
> INFO [ScheduledTasks:1] 2013-05-31 11:59:12,467 StatusLogger.java (line 68)
> MemtablePostFlusher 1 32 0
> INFO [ScheduledTasks:1] 2013-05-31 11:59:12,469 StatusLogger.java (line 73)
> CompactionManager 1 2
> {noformat}
> What makes this ridiculous is that, at the time, the data directory on this
> node had 981GB free disk space (as reported by du). We primarily use STCS and
> at the time the aforementioned exception occurred, at least one compaction
> task was executing which could have easily involved 981GB (or more) worth of
> input SSTables. Correct me if I am wrong but but Cassandra counts data
> currently being compacted against available disk space. In our case, this is
> a significant overestimation of the space required by compaction since a
> large portion of the data being compacted has expired or is an overwrite.
> More to the point though, Cassandra should not crash because its out of disk
> space unless its really actually out of disk space (ie, dont consider
> 'phantom' compaction disk usage when flushing). I have seen one of our nodes
> die in this way before our alerts for disk space even went off.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira