[ 
https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795481#comment-13795481
 ] 

Vladimir Rodionov commented on HBASE-9769:
------------------------------------------

Optimization described with initial results:

{code}
Yes, I load data into HRegion (with CACHE_ON_WRITE) than call flashcache() (no 
data in memstore).

This is what I found: the default implementation of  ExplicitColumnMatcher is 
(possibly) tuned to very large rows, I would say - very large. We need a hint 
for scan which  tells StoreScanner which strategy to use :

1. ExplicitColumnMatcher with reseeks (what we have currently) for very large 
rows
Or for small/medium rows
2. Remove explicit columns/families  from a Scan and replace them with 
additional filter which actually keeps columnFamilyMap from scan and verifies 
every KV  matches with this map.

I have created such a filter (ExplicitColumnsFilter) and verified that it works 
much better than case 1. for small/medium rows. For 1 CF + 5 CQs and Scan with 
2 CQs I have:

400K rows per sec with default
1.45M with ExplicitScanReplacementFilter (> 350% improvement)

Raw scanner (no columns specified) runs at 1.6M rows per sec. Its just 10% 
performance hit to run scanner with 2 explicit column qualifiers.
{code}

> Improve Scanner with explicit column list performance when rows are 
> small/medium size
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-9769
>                 URL: https://issues.apache.org/jira/browse/HBASE-9769
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.98.0, 0.94.12, 0.96.0
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to