[ 
https://issues.apache.org/jira/browse/HBASE-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15434803#comment-15434803
 ] 

Anoop Sam John commented on HBASE-16213:
----------------------------------------

One thought after one more look
bq.private List<Integer> rowsOffset = new ArrayList<Integer>(64);
So we add all row offset into this List and then finally write all ints to the 
Hfile block's stream.  Every addition to List needs an Object creation (int to 
Integer autoboxing) and so many garbage.  We ca avoid this.
Instead of List we can create a ByteArrayOutputStream (See 
org.apache.hadoop.hbase.io.BAOS)  and write offsets in final serializing way 
and at the end   write getBuffer() at once.  The capacity of the BAOS can be 
initialized with 64 * 4. It will resize automatically as per the need.  Also 
#rows can be calculated as BAOS#size()/4
WDYT?

> A new HFileBlock structure for fast random get
> ----------------------------------------------
>
>                 Key: HBASE-16213
>                 URL: https://issues.apache.org/jira/browse/HBASE-16213
>             Project: HBase
>          Issue Type: New Feature
>          Components: Performance
>            Reporter: binlijin
>            Assignee: binlijin
>         Attachments: HBASE-16213-master_v1.patch, 
> HBASE-16213-master_v3.patch, HBASE-16213-master_v4.patch, 
> HBASE-16213-master_v5.patch, HBASE-16213.patch, HBASE-16213_branch1_v3.patch, 
> HBASE-16213_v2.patch, cpu_blocksize_64K_valuelength_16B.png, 
> cpu_blocksize_64K_valuelength_256B.png, 
> cpu_blocksize_64K_valuelength_64B.png, hfile-cpu.png, 
> hfile_block_performance.pptx, hfile_block_performance2.pptx, 
> new-hfile-block.xlsx, qps_blocksize_64K_valuelength_16B.png, 
> qps_blocksize_64K_valuelength_256B.png, qps_blocksize_64K_valuelength_64B.png
>
>
> HFileBlock store cells sequential, current when to get a row from the block, 
> it scan from the first cell until the row's cell.
> The new structure store every row's start offset with data, so it can find 
> the exact row with binarySearch.
> I use EncodedSeekPerformanceTest test the performance.
> First use ycsb write 100w data, every row have only one qualifier, and 
> valueLength=16B/64/256B/1k.
> Then use EncodedSeekPerformanceTest to test random read 1w or 100w row, and 
> also record HFileBlock's dataSize/dataWithMetaSize in the encoding.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to