[ 
https://issues.apache.org/jira/browse/HBASE-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13795469#comment-13795469
 ] 

Vladimir Rodionov commented on HBASE-9769:
------------------------------------------

>From dev-list:
{code}
I modified tests:
Now I created table with one CF and 5 columns: CQ1,..,CQ5
1. Scan.addColumn(CF, CQ1);
    Scan.addColumn(CF, CQ3);
2. Scan.addFamily(CF);
Scan performance from block cache:
1.  400K rows per sec
2.  1.6M rows per sec
The explicit columns scan performance  is even worse in this case. It is much 
faster to scan the WHOLE rows and filter columns later in a Filter, than 
specify columns directly in a Scan.
{code}
I profiled the last test case (5 columns total and 2 in a scan).

80% of StoreScanner.next() execution time are in :

StoreScanner.reseek() - 71%
ScanQueryMathcer.getKeyForNextColumn() - 6%
ScanQueryMathcer.getKeyForNextRow() - 2%
{code}

{code}

> Improve Scanner with explicit column list performance when rows are 
> small/medium size
> -------------------------------------------------------------------------------------
>
>                 Key: HBASE-9769
>                 URL: https://issues.apache.org/jira/browse/HBASE-9769
>             Project: HBase
>          Issue Type: Improvement
>          Components: Scanners
>    Affects Versions: 0.98.0, 0.94.12, 0.96.0
>            Reporter: Vladimir Rodionov
>            Assignee: Vladimir Rodionov
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to