[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

Stefania (JIRA) Fri, 03 Apr 2015 02:34:50 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-8915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394244#comment-14394244
 ]


Stefania commented on CASSANDRA-8915:
-------------------------------------

In case you guys have not seen it yet, please check the changes proposed by 
CASSANDRA-8180, specifically this comment here: 
https://issues.apache.org/jira/browse/CASSANDRA-8180?focusedCommentId=14381674&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14381674.

The idea is that there will be two type of candidates, one greedy that knows 
its first value as it is the case right now. Another one, lazy, that gets 
compared based on a less accurate lower bound. What this means is that once 
this lazy candidate is picked, only then will it access the iterator to 
determine the exact first value, which could be much higher that the initial 
lower bound. 

The way I implemented this with the present implementation of the merge 
iterator is to add the lazy candidate back to the priority queue after it has 
calculated its first accurate value. It's not very elegant however and it is 
kind of wasteful.

If too complex to merge both approaches in one algorithm, we can always 
specialize a separate merge iterator implementation for supporting lazy 
candidates.

> Improve MergeIterator performance
> ---------------------------------
>
>                 Key: CASSANDRA-8915
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8915
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Branimir Lambov
>            Assignee: Branimir Lambov
>            Priority: Minor
>
> The implementation of {{MergeIterator}} uses a priority queue and applies a 
> pair of {{poll}}+{{add}} operations for every item in the resulting sequence. 
> This is quite inefficient as {{poll}} necessarily applies at least {{log N}} 
> comparisons (up to {{2log N}}), and {{add}} often requires another {{log N}}, 
> for example in the case where the inputs largely don't overlap (where {{N}} 
> is the number of iterators being merged).
> This can easily be replaced with a simple custom structure that can perform 
> replacement of the top of the queue in a single step, which will very often 
> complete after a couple of comparisons and in the worst case scenarios will 
> match the complexity of the current implementation.
> This should significantly improve merge performance for iterators with 
> limited overlap (e.g. levelled compaction).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-8915) Improve MergeIterator performance

Reply via email to