[
https://issues.apache.org/jira/browse/CASSANDRA-3532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Eric Parusel updated CASSANDRA-3532:
------------------------------------
Attachment: 3532.txt
Here's a small patch, and a few notes:
- I wasn't sure how to best document the interaction with BITMAP_INDEX, hope a
TODO there is ok.
- I created a copy of descriptor.asTemporary(true). Is descriptor always
guaranteed to be temporary==true? It left me a little uneasy.
- I used SSTable.delete (while checking ahead of time that the file exists),
because I wasn't sure if ordering of deletes was important.
- SSTableReader.open() is the other method that calls SSTable.componentsFor, I
only mention this because it may be a suboptimal call as well.
> Compaction cleanupIfNecessary costly when many files in data dir
> ----------------------------------------------------------------
>
> Key: CASSANDRA-3532
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3532
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Environment: Solaris 10, 1.0.4 release candidate
> Reporter: Eric Parusel
> Labels: compaction
> Fix For: 1.0.5
>
> Attachments: 3532.txt
>
>
> From what I can tell SSTableWriter.cleanupIfNecessary seems increasingly
> costly as the number of files in the data dir increases.
> It calls SSTable.componentsFor(descriptor, Descriptor.TempState.TEMP) which
> lists all files in the data dir to find matching components.
> Am I roughly correct that (cleanupCost = SSTable count * data dir size)?
> We had been doing write load testing with default compaction throttling
> (16MB/s) and LeveledCompaction.
> Unfortunately we haven't been keeping tabs on sstable counts and it grew out
> of control.
> On a system with 300,000 sstables (!) here is an example of our compaction
> rate. Note that as you're probably aware cleanupIfNecessary is included in
> the timing:
> INFO [CompactionExecutor:48] 2011-11-25 22:25:30,353 CompactionTask.java
> (line 213) Compacted to
> [/data1/cassandra/data/MA_DDR/indexes_03-hc-5369-Data.db,]. 5,821,590 to
> 5,306,354 (~91% of original) bytes for 123 keys at 0.163755MB/s. Time:
> 30,903ms.
> Here's a slightly larger one:
> INFO [CompactionExecutor:43] 2011-11-25 22:23:28,956 CompactionTask.java
> (line 213) Compacted to
> [/data1/cassandra/data/MA_DDR/indexes_03-hc-5336-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5337-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5338-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5339-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5340-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5341-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5342-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5343-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5344-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5345-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5346-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5347-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5348-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5349-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5350-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5351-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5352-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5353-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5354-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5355-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5356-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5357-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5358-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5359-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5360-Data.db,/data1/cassandra/data/MA_DDR/indexes_03-hc-5361-Data.db,].
> 140,706,512 to 137,990,868 (~98% of original) bytes for 2,181 keys at
> 0.338627MB/s. Time: 388,623ms.
> This is with compaction throttling set to 0 (Off).
> So I believe because of this it's going to take a very long time to recover
> from having so many small sstables.
> It might be notable that we're using Solaris 10, possibly listFiles() is
> faster on other platforms?
> Is it feasible to keep track of the temp files and just delete them rather
> than searching for them for each SSTable using SSTable.componentsFor()?
> Here's the stack trace for the CompactionExecutor:14 thread that appears to
> be occupying the majority of the cpu time on this node:
> Name: CompactionExecutor:14
> State: RUNNABLE
> Total blocked: 3 Total waited: 1,610,714
> Stack trace:
> java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
> java.io.UnixFileSystem.getBooleanAttributes(Unknown Source)
> java.io.File.isDirectory(Unknown Source)
> org.apache.cassandra.io.sstable.SSTable$3.accept(SSTable.java:204)
> java.io.File.listFiles(Unknown Source)
> org.apache.cassandra.io.sstable.SSTable.componentsFor(SSTable.java:200)
> org.apache.cassandra.io.sstable.SSTableWriter.cleanupIfNecessary(SSTableWriter.java:289)
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:189)
> org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:57)
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:134)
> org.apache.cassandra.db.compaction.CompactionManager$1.call(CompactionManager.java:114)
> java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> java.util.concurrent.FutureTask.run(Unknown Source)
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> java.lang.Thread.run(Unknown Source)
> No matter where I click in the busy Compaction thread timeline in YourKit
> it's in Running state and showing this above trace, except for short periods
> of time where it's actually compacting :)
> Thanks,
> Eric
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira