[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15414885#comment-15414885
 ] 

ramkrishna.s.vasudevan commented on HBASE-16213:
------------------------------------------------

Perf improvement is great. With smaller blocks and bigger value size impact is 
lesser as only very few rows are to be found so that seek is not taking time. 
The meta data overhead is at the max 4k more I think. 
HAving multiple columns for the same row also should go with the same meta data 
overhead only (if the total size is going to account to approx 1K).
Went through the patch. 
Some of the tag related decode and encode can be moved to a subclass and avoid 
duplicate with the existing code I think.
And see if the SeekState's Cell impl should be all together new in the new 
EncodedSeeker state implementation. May be they can be reused. I have not 
checked if there is something different so that it is not getting reused.
I think all the existing tests for DBE would work with this because the new DBE 
enum will iterate through all. Do you need any specific test case for these new 
types?

> A new HFileBlock structure for fast random get
> ----------------------------------------------
>
>                 Key: HBASE-16213
>                 URL: https://issues.apache.org/jira/browse/HBASE-16213
>             Project: HBase
>          Issue Type: New Feature
>          Components: Performance
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16213-master_v1.patch, HBASE-16213.patch, 
> HBASE-16213_branch1_v3.patch, HBASE-16213_v2.patch, hfile-cpu.png, 
> hfile_block_performance.pptx, new-hfile-block.xlsx
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to