[ 
https://issues.apache.org/jira/browse/HBASE-2496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-2496:
--------------------------------------

    Attachment: HBASE-2496.patch

Patch that fixes 2 issues:

 - HRS: in next() we instantiate the AL with the default size. For big caching 
values this is inefficient. Also we are re-creating ALs in a loop for _every_ 
scanned row, we could just reuse the same instance and clear it.

 - ExplicitColumnTracker: reset() calls buildColumnList() which creates a new 
AL with new ColumnCounts after every row which is unnecessary, and possibly 
slowing long scans. I added a way to reset the count in ColumnCount, I keep a 
reference to all of them that I then reuse in buildColumnList. Also adding a 
couple of finals.

I tested this patch on 166M rows with a modified RowCounter that doesn't use 
block caching and that caches 10k rows. I tested 3 times each version (without 
and with the patch), major compacting the table before counting and restarting 
between the 2 runs (but not restarting HDFS).

With the patch:
3mins, 34sec
3mins, 19sec
3mins, 22sec

Without the patch:
4mins, 36sec
3mins, 56sec
3mins, 55sec

> Less ArrayList churn on the scan path
> -------------------------------------
>
>                 Key: HBASE-2496
>                 URL: https://issues.apache.org/jira/browse/HBASE-2496
>             Project: Hadoop HBase
>          Issue Type: Improvement
>    Affects Versions: 0.20.3
>            Reporter: Jean-Daniel Cryans
>            Assignee: Jean-Daniel Cryans
>             Fix For: 0.20.5, 0.21.0
>
>         Attachments: HBASE-2496.patch
>
>
> Doing some profiling when testing the scanning speed of 0.20.4, I saw that we 
> are spending a lot of time instantiating ArrayLists when scanning and that we 
> could sometime set the right size of the arrays. I don't expect big 
> improvements for short scans, but people like us who are scanning in batches 
> of 10k could get some nice speedups.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to