[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15383619#comment-15383619
 ] 

stack commented on HBASE-16213:
-------------------------------

Nice. Seek in the row when random reading is one of the main consumers of CPU.

Why bother having two encoders? Why not just one that does row and column 
family index?

Any idea on how much more work we are doing when this is enabled (CPU?). Is it 
less with this feature on or more? Under what circumstances do you think?

Let me try this.  Meantime here are some comments on the patch:

In class comment, either in encoder or decoder, describe how the encoding 
works, what layout looks like with some advice on when to use it. Can then copy 
paste as the release note on this issue.

For...

            builder.write(cell);

Could the above return a length so you don't have to reget it on the next line 
with:

    int size = KeyValueUtil.length(cell);

The length parse costs.

Anywhere that you can get count of how many kvs in block that you can use here:

      List<ByteBuffer> kvs = new ArrayList<ByteBuffer>();

Remove these...

    // TODO Auto-generated method stub

Put these together?


102           LOG.trace("RowNumber: " + rowsOffset.size());
103           LOG.trace("onDiskSize: " + onDiskSize);

One line is easier to read than two...

Got half way through... will be back w/ more. Nice.











> A new HFileBlock structure for fast random get
> ----------------------------------------------
>
>                 Key: HBASE-16213
>                 URL: https://issues.apache.org/jira/browse/HBASE-16213
>             Project: HBase
>          Issue Type: New Feature
>          Components: Performance
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_v2.patch
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to