[
https://issues.apache.org/jira/browse/HBASE-9969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13828176#comment-13828176
]
Matt Corgan commented on HBASE-9969:
------------------------------------
{quote}From your experiments, looks like there is one winner{quote}I don't
know... I think LoserTree does fewer comparisons when there are more scanners
and is therefore better. However, it looks like KeyValueScannerHeap is faster
with consecutive KVs from the same scanner (a single scanner or many cols/row).
I'm hoping this benchmark can help us mix and match the best aspects of the
two.
I may even take a stab at replacing the binary search behavior in
KeyValueScannerHeap with a hard-coded comparison order when numScanners is
between 2 and ~4.
> Improve KeyValueHeap using loser tree
> -------------------------------------
>
> Key: HBASE-9969
> URL: https://issues.apache.org/jira/browse/HBASE-9969
> Project: HBase
> Issue Type: Improvement
> Components: Performance, regionserver
> Reporter: Chao Shi
> Assignee: Chao Shi
> Fix For: 0.98.0, 0.96.1
>
> Attachments: 9969-0.94.txt, KeyValueHeapBenchmark_v1.ods,
> KeyValueHeapBenchmark_v2.ods, hbase-9969-pq-v1.patch, hbase-9969-pq-v2.patch,
> hbase-9969-v2.patch, hbase-9969-v3.patch, hbase-9969.patch, hbase-9969.patch,
> kvheap-benchmark.png, kvheap-benchmark.txt
>
>
> LoserTree is the better data structure than binary heap. It saves half of the
> comparisons on each next(), though the time complexity is on O(logN).
> Currently A scan or get will go through two KeyValueHeaps, one is merging KVs
> read from multiple HFiles in a single store, the other is merging results
> from multiple stores. This patch should improve the both cases whenever CPU
> is the bottleneck (e.g. scan with filter over cached blocks, HBASE-9811).
> All of the optimization work is done in KeyValueHeap and does not change its
> public interfaces. The new code looks more cleaner and simpler to understand.
--
This message was sent by Atlassian JIRA
(v6.1#6144)