[ https://issues.apache.org/jira/browse/CASSANDRA-11479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15222438#comment-15222438 ]
Yuki Morishita edited comment on CASSANDRA-11479 at 4/1/16 10:23 PM: --------------------------------------------------------------------- Code-wise, It looks like compaction still can be running during {{runWithCompactionDisabled}}. >From the log attached (Thank you!): {noformat} TRACE [MemtableFlushWriter:1] 2016-04-01 15:02:32,335 Scheduling a background task check for system.batches with SizeTieredCompactionStrategy {noformat} Here, compaction was submitted for flushed SSTable. {noformat} TRACE [main] 2016-04-01 15:02:32,337 Cancelling in-progress compactions for batches TRACE [CompactionExecutor:1] 2016-04-01 15:02:32,337 Compaction buckets are [] TRACE [main] 2016-04-01 15:02:32,337 Compactions successfully cancelled {noformat} and cancelling compaction happened, but since compaction strategy (STCS) failed to grab compacting SSTables, compaction task was not yet created. {code:java} // interrupt in-progress compactions CompactionManager.instance.interruptCompactionForCFs(selfWithAuxiliaryCfs, interruptValidation); CompactionManager.instance.waitForCessation(selfWithAuxiliaryCfs); {code} did not happen since there was no {{CompactionTask}} and no SSTables were in compaction yet. So, {noformat} TRACE [main] 2016-04-01 15:02:32,337 Discarding sstable data for truncated CF + indexes {noformat} truncate proceeded, but {noformat} TRACE [CompactionExecutor:1] 2016-04-01 15:02:32,337 Compaction buckets are [[BigTableReader(path='build/test/cassandra/data:0/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-5-big-Data.db'), BigTableReader(path='build/test/cassandra/data:0/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-4-big-Data.db')]] {noformat} in CompactionExecutor, STCS was still trying to grab compaction candidate and it succeeded, putting those two into {{compacting}} SSTable. EDIT: The first "Compaction buckets are: []" can be for repaired SSTables, and second one is for unrepaired. Thus, inside truncate logic, {code:java} public ReplayPosition discardSSTables(long truncatedAt) { assert data.getCompacting().isEmpty() : data.getCompacting(); .... } {code} throws AssertionError claiming some SSTables are still compacting. {noformat} INFO [CompactionExecutor:1] 2016-04-01 15:02:32,345 Compaction interrupted: Compaction@919a4bc5-7a33-3573-b03e-13fc3f68b465(system, batches, 0/240554)bytes {noformat} Eventually, the submitted {{CompactionTask}} was aborted since {{runWithCompactionDisabled}} marks compaction strategy to {{pause()}} at first. was (Author: yukim): Code-wise, It looks like compaction still can be running during {{runWithCompactionDisabled}}. >From the log attached (Thank you!): {noformat} TRACE [MemtableFlushWriter:1] 2016-04-01 15:02:32,335 Scheduling a background task check for system.batches with SizeTieredCompactionStrategy {noformat} Here, compaction was submitted for flushed SSTable. {noformat} TRACE [main] 2016-04-01 15:02:32,337 Cancelling in-progress compactions for batches TRACE [CompactionExecutor:1] 2016-04-01 15:02:32,337 Compaction buckets are [] TRACE [main] 2016-04-01 15:02:32,337 Compactions successfully cancelled {noformat} and cancelling compaction happened, but since compaction strategy (STCS) failed to grab compacting SSTables, compaction task was not yet created. {code:java} // interrupt in-progress compactions CompactionManager.instance.interruptCompactionForCFs(selfWithAuxiliaryCfs, interruptValidation); CompactionManager.instance.waitForCessation(selfWithAuxiliaryCfs); {code} did not happen since there was no {{CompactionTask}} and no SSTables were in compaction yet. So, {noformat} TRACE [main] 2016-04-01 15:02:32,337 Discarding sstable data for truncated CF + indexes {noformat} truncate proceeded, but {noformat} TRACE [CompactionExecutor:1] 2016-04-01 15:02:32,337 Compaction buckets are [[BigTableReader(path='build/test/cassandra/data:0/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-5-big-Data.db'), BigTableReader(path='build/test/cassandra/data:0/system/batches-919a4bc57a333573b03e13fc3f68b465/ma-4-big-Data.db')]] {noformat} in CompactionExecutor, [STCS was still trying to grab compaction candidate eagerly|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java#L180] and it succeeded, putting those two into {{compacting}} SSTable. Thus, inside truncate logic, {code:java} public ReplayPosition discardSSTables(long truncatedAt) { assert data.getCompacting().isEmpty() : data.getCompacting(); .... } {code} throws AssertionError claiming some SSTables are still compacting. {noformat} INFO [CompactionExecutor:1] 2016-04-01 15:02:32,345 Compaction interrupted: Compaction@919a4bc5-7a33-3573-b03e-13fc3f68b465(system, batches, 0/240554)bytes {noformat} Eventually, the submitted {{CompactionTask}} was aborted since {{runWithCompactionDisabled}} marks compaction strategy to {{pause()}} at first. > BatchlogManager unit tests failing on truncate race condition > ------------------------------------------------------------- > > Key: CASSANDRA-11479 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11479 > Project: Cassandra > Issue Type: Bug > Reporter: Joel Knighton > Attachments: > TEST-org.apache.cassandra.batchlog.BatchlogManagerTest.log > > > Example on CI > [here|http://cassci.datastax.com/job/trunk_testall/818/testReport/junit/org.apache.cassandra.batchlog/BatchlogManagerTest/testLegacyReplay_compression/]. > This seems to have only started happening relatively recently (within the > last month or two). > As far as I can tell, this is only showing up on BatchlogManagerTests purely > because it is an aggressive user of truncate. The assertion is hit in the > setUp method, so it can happen before any of the test methods. The assertion > occurs because a compaction is happening when truncate wants to discard > SSTables; trace level logs suggest that this compaction is submitted after > the pause on the CompactionStrategyManager. > This should be reproducible by running BatchlogManagerTest in a loop - it > takes up to half an hour in my experience. A trace-level log from such a run > is attached - grep for my added log message "SSTABLES COMPACTING WHEN > DISCARDING" to find when the assert is hit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)