[ https://issues.apache.org/jira/browse/CASSANDRA-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001830#comment-15001830 ]
Stefania commented on CASSANDRA-10538: -------------------------------------- bq. How does Throwables.perfom handle AssertionError? It looks like it swallows it? Seems like AssertionError shouldn't be caught and should be allowed to terminate the JVM? It merges it and passes it to the caller like any other {{Throwable}}, I don't think we should change this. {quote} To make sure I understand the fix. The issue was that we marked something committed in memory when committing (or aborting) fails to persist to disk because the disk is full. The fix was to write to disk first then memory, and if writing to disk for commit fails we can hit the abort path and then that can fail as well. Or is this hitting abort and abort like you would expect given that the disk is full and the transaction probably can't complete successfully? {quote} The fix is to update memory only if disk has been already updated so that we can try again later on and we reflect the correct on disk status in memory. Before the fix the assertion would have prevented retrying. If the disk is full the abort record will still not be added, not even on the second attempt during the final close, but that's OK since a missing final record means transaction aborted anyway. Also, neither commit nor abort should throw, it's the caller of commit that may decide to abort ({{LifecycleTransaction}}), but {{LogTransaction}} should not throw during commit or abort. > Assertion failed in LogFile when disk is full > --------------------------------------------- > > Key: CASSANDRA-10538 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10538 > Project: Cassandra > Issue Type: Bug > Reporter: Stefania > Assignee: Stefania > Fix For: 3.x > > Attachments: > ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, > ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log > > > [~carlyeks] was running a stress job which filled up the disk. At the end of > the system logs there are several assertion errors: > {code} > ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 > - Exception in thread Thread[CompactionExecutor:1,1,main] > java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156) > ~[main/:na] > at > org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77) > ~[main/:na] > at > org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78) > ~[main/:na] > at > org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61) > ~[main/:na] > at > org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220) > ~[main/:na] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ~[na:1.8.0_40] > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > ~[na:1.8.0_40] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_40] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40] > INFO [IndexSummaryManager:1] 2015-10-14 21:10:40,099 > IndexSummaryManager.java:257 - Redistributing index summaries > ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 > CassandraDaemon.java:195 - Exception in thread > Thread[IndexSummaryManager:1,1,main] > java.lang.AssertionError: Already completed! > at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) > ~[main/:na] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193) > ~[main/:na] > at > org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158) > ~[main/:na] > at > org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242) > ~[main/:na] > at > org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134) > ~[main/:na] > at > org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) > ~[main/:na] > at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE > {code} > We should not have an assertion if it can happen when the disk is full, we > should rather have a runtime exception. > I also would like to understand exactly what triggered the assertion. > {{LifecycleTransaction}} can throw at the beginning of the commit method if > it cannot write the record to disk, in which case all we have to do is ensure > we update the records in memory after writing to disk (currently we update > them before). However, I am not sure this is what happened here, it looks > more like abort was called twice, which should never happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)