[ 
https://issues.apache.org/jira/browse/CASSANDRA-3234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-3234:
----------------------------------------

    Attachment: 0005-use-Array-and-Tree-backed-columns-in-compaction-v2.patch

I haven't looked at the 3 first patches, but on patch 4 and 5.

+1 on patch 4 (though I agree with the comment in there that it's not the more 
beautiful refactor ever :))

On patch 5, it cloneMeShallow the first read column family and basically skip 
all the columns, so that's wrong. Attaching a v2 that makes SSTII directly use 
the right ISortedColumn factory (to avoid full cloning). Problem is this 
doesn't translate to ParallelCompactionIterable too well since the actual read 
is deep into the code. For it, I think we have 2 easy solutions:
  * Just use ArraySortedColumns all the way. This is actually ok because addAll 
works whatever the input is, it does a merge.
  * Do a full clone to a TreeMapBacked CF on the first cf read
  * Use TreeMapBack CFs all the way.

I went with the first solution in the patch attached (more because it requires 
the less changes than anything else), though that's probably not optimal for 
LeveledCompaction (but I'm not sure ParallelCompaction is useful for 
LeveledCompaction). 

> LeveledCompaction has several performance problems
> --------------------------------------------------
>
>                 Key: CASSANDRA-3234
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3234
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>    Affects Versions: 1.0.0
>            Reporter: Jonathan Ellis
>            Assignee: Jonathan Ellis
>             Fix For: 1.0.0
>
>         Attachments: 0001-optimize-single-source-case-for-MergeIterator.txt, 
> 0002-add-TrivialOneToOne-optimization.txt, 
> 0003-fix-leveled-BF-size-calculation.txt, 
> 0004-avoid-calling-shouldPurge-unless-necessary.txt, 
> 0005-use-Array-and-Tree-backed-columns-in-compaction-v2.patch, 
> 0005-use-Array-and-Tree-backed-columns-in-compaction.txt
>
>
> Two main problems:
> - BF size calculation doesn't take into account LCS breaking the output apart 
> into "bite sized" sstables, so memory use is much higher than predicted
> - ManyToMany merging is slow.  At least part of this is from running the full 
> reducer machinery against single input sources, which can be optimized away.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to