Lars Hofhansl created HBASE-9778:
------------------------------------
Summary: Avoid seeking to next column in ExplicitColumnTracker
when possible
Key: HBASE-9778
URL: https://issues.apache.org/jira/browse/HBASE-9778
Project: HBase
Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.98.0, 0.94.13, 0.96.1
The issue of slow seeking in ExplicitColumnTracker was brought up by
[~vrodionov] on the dev list.
My idea here is to avoid the seeking if we know that there aren't many rows to
skip.
How do we know? We'll use the column family's VERSIONS setting as a hint. If
VERSIONS is set to 1 (or maybe some value < 10) we'll avoid the seek and call
SKIP repeatedly.
HBASE-9769 has some initial number for this approach:
Interestingly it depends on which column(s) is (are) selected.
Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1,
everything filtered at the server with a ValueFilter. Everything measured in
seconds.
Without patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.5|14.3|14.6|11.1|20.3|
With patch:
||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
|6.4|8.4|8.9|9.9|6.4|10.0|
Variation here was +- 0.2s.
So with this patch scanning is 2x faster than without in some cases, and never
slower. No special hint needed, beyond declaring VERSIONS correctly.
--
This message was sent by Atlassian JIRA
(v6.1#6144)