[
https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795473#comment-13795473
]
Vladimir Rodionov commented on HBASE-9769:
------------------------------------------
LarsH:
{code}
Interesting. Thanks for doing the testing/profiling Vladimir!
Generally reseeks are better if they can skip many KVs.
For example if you have many versions of the same row/col, INCLUDE_NEXT_COL
will be better than issuing many INCLUDEs, same with INCLUDE_NEXT_ROW if there
are many columns.
Since the number of columns/versions is not known at scan time (and can in fact
vary between rows) it is hard to always do the right thing. It also depends on
how large the KVs are average. So replacing INCLUDE_NEXT_XXX with INCLUDE is
not always the right idea.
Thinking aloud... We could take the VERSIONS setting of the column family into
account as a guideline for the expected number of versions (but there's no
guarantee about how many version we'll actually have until we had a
compaction), and replace INCLUDE_NEXT_COL with INCLUDE if VERSIONS is small
(maybe < 10 or so). Maybe that'd be worth a jira...
There are some fixes in 0.94.12 (HBASE-8930, avoid a superfluous reseek in some
cases), and HBASE-9732 might help in 0.94.13 (avoid memory fences on an
volatile on each seek/reseek).
It also would be nice to figure out why reseek is so much more expensive. If
the KV we reseek to is on the same block it should just scan forward, otherwise
it'll look in the appropriate block. It probably is the creation of the fake KV
we want to seek to (like firstOnRow, lastOnRow, etc), which case there's not
much we can.
Lastly, I've not spend much time profiling the ExplicitColumnMatcher, yet,
looks like I should start doing that.
{code}
> Improve Scanner with explicit column list performance when rows are
> small/medium size
> -------------------------------------------------------------------------------------
>
> Key: HBASE-9769
> URL: https://issues.apache.org/jira/browse/HBASE-9769
> Project: HBase
> Issue Type: Improvement
> Components: Scanners
> Affects Versions: 0.98.0, 0.94.12, 0.96.0
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
>
--
This message was sent by Atlassian JIRA
(v6.1#6144)