[ 
https://issues.apache.org/jira/browse/CASSANDRA-10538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15001830#comment-15001830
 ] 

Stefania commented on CASSANDRA-10538:
--------------------------------------

bq. How does Throwables.perfom handle AssertionError? It looks like it swallows 
it? Seems like AssertionError shouldn't be caught and should be allowed to 
terminate the JVM?

It merges it and passes it to the caller like any other {{Throwable}}, I don't 
think we should change this.

{quote}
To make sure I understand the fix. The issue was that we marked something 
committed in memory when committing (or aborting) fails to persist to disk 
because the disk is full. The fix was to write to disk first then memory, and 
if writing to disk for commit fails we can hit the abort path and then that can 
fail as well.

Or is this hitting abort and abort like you would expect given that the disk is 
full and the transaction probably can't complete successfully?
{quote}

The fix is to update memory only if disk has been already updated so that we 
can try again later on and we reflect the correct on disk status in memory. 
Before the fix the assertion would have prevented retrying. If the disk is full 
the abort record will still not be added, not even on the second attempt during 
the final close, but that's OK since a missing final record means transaction 
aborted anyway. 

Also, neither commit nor abort should throw, it's the caller of commit that may 
decide to abort ({{LifecycleTransaction}}), but {{LogTransaction}} should not 
throw during commit or abort.

> Assertion failed in LogFile when disk is full
> ---------------------------------------------
>
>                 Key: CASSANDRA-10538
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10538
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Stefania
>            Assignee: Stefania
>             Fix For: 3.x
>
>         Attachments: 
> ma_txn_compaction_67311da0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_696059b0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_8ac58b70-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_8be24610-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_95500fc0-72b4-11e5-9eb9-b14fa4bbe709.log, 
> ma_txn_compaction_a41caa90-72b4-11e5-9eb9-b14fa4bbe709.log
>
>
> [~carlyeks] was running a stress job which filled up the disk. At the end of 
> the system logs there are several assertion errors:
> {code}
> ERROR [CompactionExecutor:1] 2015-10-14 20:46:55,467 CassandraDaemon.java:195 
> - Exception in thread Thread[CompactionExecutor:1,1,main]
> java.lang.RuntimeException: Insufficient disk space to write 2097152 bytes
>         at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.getWriteDirectory(CompactionAwareWriter.java:156)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.writers.MaxSSTableSizeWriter.realAppend(MaxSSTableSizeWriter.java:77)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:110)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:182)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:61)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:220)
>  ~[main/:na]
>         at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> ~[na:1.8.0_40]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266) 
> ~[na:1.8.0_40]
>         at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_40]
>         at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_40]
>         at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40]
> INFO  [IndexSummaryManager:1] 2015-10-14 21:10:40,099 
> IndexSummaryManager.java:257 - Redistributing index summaries
> ERROR [IndexSummaryManager:1] 2015-10-14 21:10:42,275 
> CassandraDaemon.java:195 - Exception in thread 
> Thread[IndexSummaryManager:1,1,main]
> java.lang.AssertionError: Already completed!
>         at org.apache.cassandra.db.lifecycle.LogFile.abort(LogFile.java:221) 
> ~[main/:na]
>         at 
> org.apache.cassandra.db.lifecycle.LogTransaction.doAbort(LogTransaction.java:376)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
>  ~[main/:na]
>         at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.doAbort(LifecycleTransaction.java:259)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:144)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.abort(Transactional.java:193)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.concurrent.Transactional$AbstractTransactional.close(Transactional.java:158)
>  ~[main/:na]
>         at 
> org.apache.cassandra.io.sstable.IndexSummaryManager.redistributeSummaries(IndexSummaryManager.java:242)
>  ~[main/:na]
>         at 
> org.apache.cassandra.io.sstable.IndexSummaryManager$1.runMayThrow(IndexSummaryManager.java:134)
>  ~[main/:na]
>         at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[main/:na]
>         at org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolE
> {code}
> We should not have an assertion if it can happen when the disk is full, we 
> should rather have a runtime exception.
> I also would like to understand exactly what triggered the assertion. 
> {{LifecycleTransaction}} can throw at the beginning of the commit method if 
> it cannot write the record to disk, in which case all we have to do is ensure 
> we update the records in memory after writing to disk (currently we update 
> them before). However, I am not sure this is what happened here, it looks 
> more like abort was called twice, which should never happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to