Stefania created CASSANDRA-10538:
------------------------------------
Summary: Assertion failed in LogFile when disk is full
Key: CASSANDRA-10538
URL: https://issues.apache.org/jira/browse/CASSANDRA-10538
Project: Cassandra
Issue Type: Bug
Reporter: Stefania
Assignee: Stefania
Fix For: 3.x
Attachments:
ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log,
ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log,
ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log,
ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log,
ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log,
ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log
[~carlyeks] was running a stress job which filled up the disk. At the end of
the system logs there are several assertion errors:
{code}
ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 -
Exception in thread Thread[CompactionExecutor:1,1,main]
java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes
at
org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156)
~[main/:na]
at
org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77)
~[main/:na]
at
org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110)
~[main/:na]
at
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182)
~[main/:na]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[main/:na]
at
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
~[main/:na]
at
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
~[main/:na]
at
org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220)
~[main/:na]
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[na:1.8.0_40]
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[na:1.8.0_40]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
~[na:1.8.0_40]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
INFO [IndexSummaryManager:1] 2015-10-14 21:10:40,099
IndexSummaryManager.java:257 - Redistributing index summaries
ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 CassandraDaemon.java:195
- Exception in thread Thread[IndexSummaryManager:1,1,main]
java.lang.AssertionError: Already completed!
at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221)
~[main/:na]
at
org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376)
~[main/:na]
at
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
~[main/:na]
at
org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259)
~[main/:na]
at
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
~[main/:na]
at
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193)
~[main/:na]
at
org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158)
~[main/:na]
at
org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242)
~[main/:na]
at
org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134)
~[main/:na]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[main/:na]
at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE
{code}
We should not have an assertion if it can happen when the disk is full, we
should rather have a runtime exception.
I also would like to understand exactly what triggered the assertion.
{{LifecycleTransaction}} can throw at the beginning of the commit method if it
cannot write the record to disk, in which case all we have to do is ensure we
update the records in memory after writing to disk (currently we update them
before). However, I am not sure this is what happened here, it looks more like
abort was called twice, which should never happen.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)