[
https://issues.apache.org/jira/browse/CASSANDRA-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sylvain Lebresne updated CASSANDRA-3234:
----------------------------------------
Attachment: 0005-use-Array-and-Tree-backed-columns-in-compaction-v2.patch
I haven't looked at the 3 first patches, but on patch 4 and 5.
+1 on patch 4 (though I agree with the comment in there that it's not the more
beautiful refactor ever :))
On patch 5, it cloneMeShallow the first read column family and basically skip
all the columns, so that's wrong. Attaching a v2 that makes SSTII directly use
the right ISortedColumn factory (to avoid full cloning). Problem is this
doesn't translate to ParallelCompactionIterable too well since the actual read
is deep into the code. For it, I think we have 2 easy solutions:
* Just use ArraySortedColumns all the way. This is actually ok because addAll
works whatever the input is, it does a merge.
* Do a full clone to a TreeMapBacked CF on the first cf read
* Use TreeMapBack CFs all the way.
I went with the first solution in the patch attached (more because it requires
the less changes than anything else), though that's probably not optimal for
LeveledCompaction (but I'm not sure ParallelCompaction is useful for
LeveledCompaction).
> LeveledCompaction has several performance problems
> --------------------------------------------------
>
> Key: CASSANDRA-3234
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3234
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Affects Versions: 1.0.0
> Reporter: Jonathan Ellis
> Assignee: Jonathan Ellis
> Fix For: 1.0.0
>
> Attachments: 0001-optimize-single-source-case-for-MergeIterator.txt,
> 0002-add-TrivialOneToOne-optimization.txt,
> 0003-fix-leveled-BF-size-calculation.txt,
> 0004-avoid-calling-shouldPurge-unless-necessary.txt,
> 0005-use-Array-and-Tree-backed-columns-in-compaction-v2.patch,
> 0005-use-Array-and-Tree-backed-columns-in-compaction.txt
>
>
> Two main problems:
> - BF size calculation doesn't take into account LCS breaking the output apart
> into "bite sized" sstables, so memory use is much higher than predicted
> - ManyToMany merging is slow. At least part of this is from running the full
> reducer machinery against single input sources, which can be optimized away.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira