[
https://issues.apache.org/jira/browse/HBASE-9778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13797258#comment-13797258
]
stack commented on HBASE-9778:
------------------------------
[~lhofhansl] You think the VERSIONS config an indicator of how many absolute
versions of a particular column? I can think of pathological cases where hbase
is being used to keep say a queue where client is only interested in most
recent cell but many could be writing to the one coordinate. In this case,
we'd want to seek to the next column rather than skip, skip, skip, right?
Could we do something like the Jesse heuristic where we keep a count and skip
the first few but then switch to a seek if it looks like we are on a column of
many versions?
> Avoid seeking to next column in ExplicitColumnTracker when possible
> -------------------------------------------------------------------
>
> Key: HBASE-9778
> URL: https://issues.apache.org/jira/browse/HBASE-9778
> Project: HBase
> Issue Type: Bug
> Reporter: Lars Hofhansl
> Assignee: Lars Hofhansl
> Fix For: 0.98.0, 0.94.13, 0.96.1
>
> Attachments: 9778-0.94.txt, 9778-0.94-v2.txt, 9778-0.94-v3.txt,
> 9778-trunk.txt, 9778-trunk-v2.txt, 9778-trunk-v3.txt
>
>
> The issue of slow seeking in ExplicitColumnTracker was brought up by
> [~vrodionov] on the dev list.
> My idea here is to avoid the seeking if we know that there aren't many
> versions to skip.
> How do we know? We'll use the column family's VERSIONS setting as a hint. If
> VERSIONS is set to 1 (or maybe some value < 10) we'll avoid the seek and call
> SKIP repeatedly.
> HBASE-9769 has some initial number for this approach:
> Interestingly it depends on which column(s) is (are) selected.
> Some numbers: 4m rows, 5 cols each, 1 cf, 10 bytes values, VERSIONS=1,
> everything filtered at the server with a ValueFilter. Everything measured in
> seconds.
> Without patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.5|14.3|14.6|11.1|20.3|
> With patch:
> ||Wildcard||Col 1||Col 2||Col 4||Col 5||Col 2+4||
> |6.4|8.4|8.9|9.9|6.4|10.0|
> Variation here was +- 0.2s.
> So with this patch scanning is 2x faster than without in some cases, and
> never slower. No special hint needed, beyond declaring VERSIONS correctly.
--
This message was sent by Atlassian JIRA
(v6.1#6144)