[ 
https://issues.apache.org/jira/browse/HBASE-9000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Shi updated HBASE-9000:
----------------------------

    Attachment: hbase-9000-port-fb.patch

The attached patch is a port of linear seek code from 0.89-fb branch (with 
minor changes). I'm not sure if 20 should be a good default value for the max 
number of linear seeks.

Benchmark result:
||operation||trunk||w/ patch||
|reseek to next row|5.92 us|6.71 us|
|reseek to next column|3.735 us|0.569 us|

Configuration:
rows: 100000
columns per row: 10
versions: 3
size of row-key: 8
size of qualifier: 8
size of value: 8

bq. In all fairness, we should not divide the runtime by the number of ops. The 
whole point of seeking is to reduce the number of ops
In fact, the cost of next is listed here only for reference (e.g. tune the 
limit of linear seeks) and should not be compared to costs of reseeks. In our 
use case that scan a single row with very large offset and small limit, the 
cost of a single reseek is more meaningful, as we can directly multiple it by 
offset. I can understand that in some other cases, the total time may be more 
important.

In any cases, the goal of the benchmark program is to evaluate the performance 
gain with linear search, where we can compare these numbers w/ and w/o patch. 
The percentage of improvement does not change.

I like the [~lhofhansl]'s idea of passing a hint from ScanQueryMatcher, which 
should also benefit StoreFileScanner. I think we can also save some statistic 
information at the time a HFile is written, such as the average #versions or 
#columns, which can help us to determine if a "reseek to next row" is really 
far enough for a reseek.

> Linear reseek in Memstore
> -------------------------
>
>                 Key: HBASE-9000
>                 URL: https://issues.apache.org/jira/browse/HBASE-9000
>             Project: HBase
>          Issue Type: Improvement
>    Affects Versions: 0.89-fb
>            Reporter: Shane Hogan
>            Priority: Minor
>             Fix For: 0.89-fb
>
>         Attachments: hbase-9000-benchmark-program.patch, 
> hbase-9000-port-fb.patch
>
>
> This is to address the linear reseek in MemStoreScanner. Currently reseek 
> iterates over the kvset and the snapshot linearly by just calling next 
> repeatedly. The new solution is to do this linear seek up to a configurable 
> maximum amount of times then if the seek is not yet complete fall back to 
> logarithmic seek.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to