[ 
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13829666#comment-13829666
 ] 

Matt Corgan commented on HBASE-9969:
------------------------------------

It looks like the LoserTree always maintains the top element at the top after a 
call to updateTopValue(current), and you can easily peek() the top element 
(using topValue()).  Could the LoserTreeKeyValueHeap simply compare the current 
scanner to topValue(), and only updateTopValue(current) if it's greater than 
topValue()?  I think that's what current KeyValueHeap does.

{quote}Another optimization: as KVs are usually sharing common prefixes, we can 
skip such common part during comparisons. I haven't thought on this carefully. 
Just mention it here in case someone may have better thoughts based on 
it.{quote}yeah, I think there's potential here.  You could maintain a heap for 
each level: row, family, qualifier, timestamp.  This would probably require 
adding methods to the KeyValueScanner: nextRow(), nextQualifier(), etc.  
PrefixTree, in particular, would benefit from this because it can execute the 
nextRow() method with a trivial amount of work while the other encodings have 
to scan through every row key in the block and do a comparison on it.  If 
someone tried it, you could start with only the nextRow() optimization which 
would have the biggest payoff.

> Improve KeyValueHeap using loser tree
> -------------------------------------
>
>                 Key: HBASE-9969
>                 URL: https://issues.apache.org/jira/browse/HBASE-9969
>             Project: HBase
>          Issue Type: Improvement
>          Components: Performance, regionserver
>            Reporter: Chao Shi
>            Assignee: Chao Shi
>             Fix For: 0.98.0, 0.96.1
>
>         Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods, 
> KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch, 
> hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch, 
> kvheap-benchmark.png, kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the 
> comparisons on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs 
> read from multiple HFiles in a single store, the other is merging results 
> from multiple stores. This patch should improve the both cases whenever CPU 
> is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its 
> public interfaces. The new code looks more cleaner and simpler to understand.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to