[ 
https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795473#comment-13795473
 ] 

Vladimir Rodionov commented on HBASE-9769:
------------------------------------------

LarsH:

{code}
Interesting. Thanks for doing the testing/profiling Vladimir!

Generally reseeks are better if they can skip many KVs.

For example if you have many versions of the same row/col, INCLUDE_NEXT_COL 
will be better than issuing many INCLUDEs, same with INCLUDE_NEXT_ROW if there 
are many columns.

Since the number of columns/versions is not known at scan time (and can in fact 
vary between rows) it is hard to always do the right thing. It also depends on 
how large the KVs are average. So replacing INCLUDE_NEXT_XXX with INCLUDE is 
not always the right idea.

Thinking aloud... We could take the VERSIONS setting of the column family into 
account as a guideline for the expected number of versions (but there's no 
guarantee about how many version we'll actually have until we had a 
compaction), and replace INCLUDE_NEXT_COL with INCLUDE if VERSIONS is small 
(maybe < 10 or so). Maybe that'd be worth a jira...


There are some fixes in 0.94.12 (HBASE-8930, avoid a superfluous reseek in some 
cases), and HBASE-9732 might help in 0.94.13 (avoid memory fences on an 
volatile on each seek/reseek).

It also would be nice to figure out why reseek is so much more expensive. If 
the KV we reseek to is on the same block it should just scan forward, otherwise 
it'll look in the appropriate block. It probably is the creation of the fake KV 
we want to seek to (like firstOnRow, lastOnRow, etc), which case there's not 
much we can.

Lastly, I've not spend much time profiling the ExplicitColumnMatcher, yet, 
looks like I should start doing that.

{code}

> Improve Scanner with explicit column list performance when rows are 
> small/medium size
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-9769
>                 URL: https://issues.apache.org/jira/browse/HBASE-9769
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.98.0, 0.94.12, 0.96.0
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to