[
https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795481#comment-13795481
]
Vladimir Rodionov commented on HBASE-9769:
------------------------------------------
Optimization described with initial results:
{code}
Yes, I load data into HRegion (with CACHE_ON_WRITE) than call flashcache() (no
data in memstore).
This is what I found: the default implementation of ExplicitColumnMatcher is
(possibly) tuned to very large rows, I would say - very large. We need a hint
for scan which tells StoreScanner which strategy to use :
1. ExplicitColumnMatcher with reseeks (what we have currently) for very large
rows
Or for small/medium rows
2. Remove explicit columns/families from a Scan and replace them with
additional filter which actually keeps columnFamilyMap from scan and verifies
every KV matches with this map.
I have created such a filter (ExplicitColumnsFilter) and verified that it works
much better than case 1. for small/medium rows. For 1 CF + 5 CQs and Scan with
2 CQs I have:
400K rows per sec with default
1.45M with ExplicitScanReplacementFilter (> 350% improvement)
Raw scanner (no columns specified) runs at 1.6M rows per sec. Its just 10%
performance hit to run scanner with 2 explicit column qualifiers.
{code}
> Improve Scanner with explicit column list performance when rows are
> small/medium size
> -------------------------------------------------------------------------------------
>
> Key: HBASE-9769
> URL: https://issues.apache.org/jira/browse/HBASE-9769
> Project: HBase
> Issue Type: Improvement
> Components: Scanners
> Affects Versions: 0.98.0, 0.94.12, 0.96.0
> Reporter: Vladimir Rodionov
> Assignee: Vladimir Rodionov
>
--
This message was sent by Atlassian JIRA
(v6.1#6144)