[jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree

Chao Shi (JIRA) Mon, 25 Nov 2013 08:37:51 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13831610#comment-13831610
 ]


Chao Shi commented on HBASE-9969:
---------------------------------

bq. It looks like the LoserTree always maintains the top element at the top 
after a call to updateTopValue(current), and you can easily peek() the top 
element (using topValue()). Could the LoserTreeKeyValueHeap simply compare the 
current scanner to topValue(), and only updateTopValue(current) if it's greater 
than topValue()? I think that's what current KeyValueHeap does.

I don't think we can do this with LoserTree, because it can only used to 
determine minimal among a *fixed* number of streams at construction time. 
topValue(). It does not support "extractMin" operation as defined by a heap. So 
the winner must remain in the tree.

I tried to optimize LoserTreeKeyValueHeap and got ~10% improvement, but it is 
still worse than the priority queue based implementations. The LoserTree knows 
whether the added value is on the same row with others. I'm trying to think 
about other optimization techniques.

bq. Loser tree will always need at least 2 comparison for consecutive 
optimization. One need to compare new element with winner of a first half tree 
and with 2nd runner of a second half tree (the previous top element was a 
winner in this half - tree).

Is it true? The global 2nd runner of the tree may sit very near the leave. To 
find it out, we need logN comparisons.

> Improve KeyValueHeap using loser tree
> -------------------------------------
>
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
> KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
> hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
> kvheap-benchmark.png, kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the 
> comparisons on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
> read from multiple HFiles in a single store, the other is merging results 
> from multiple stores. This patch should improve the both cases whenever CPU 
> is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its 
> public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

[jira] [Commented] (HBASE-9969) Improve KeyValueHeap using loser tree

Reply via email to