[jira] [Comment Edited] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

Wei-Chiu Chuang (JIRA) Wed, 13 Feb 2019 20:03:42 -0800


    [ 
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16767865#comment-16767865
 ]


Wei-Chiu Chuang edited comment on HBASE-21879 at 2/14/19 4:02 AM:
------------------------------------------------------------------

It doesn't look like we'll ever have the next Hadoop 2.7.x release. LinkedIn is 
one of the big users in this release and they're looking to upgrade to 2.10 
soon.

I'm pretty sure we can get HDFS-3246 into Hadoop 2.x. It doesn't look like a 
big change.

(Or you can patch Hadoop yourself)

 

Regarding the upgrade plan, I can say Hadoop 2.8.x is quite stable, given that 
Yahoo adopted this release line, and I think they'll stay there for quite a 
while. You may try Hadoop 2.9 if there's new stuff you need, but other than 
that I am not hearing any one adopting it.


was (Author: jojochuang):
It doesn't look like we'll ever have the next Hadoop 2.7.x release. LinkedIn is 
one of the big users in this release and they're looking to upgrade to 2.10 
soon.

I'm pretty sure we can get HDFS-3246 into Hadoop 2.x. It doesn't look like a 
big change.

(Or you can patch Hadoop yourself)

> Read HFile's block to ByteBuffer directly instead of to byte for reducing 
> young gc purpose
> ------------------------------------------------------------------------------------------
>
>                 Key: HBASE-21879
>                 URL: https://issues.apache.org/jira/browse/HBASE-21879
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: Zheng Hu
>            Assignee: Zheng Hu
>            Priority: Major
>             Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
>         Attachments: QPS-latencies-before-HBASE-21879.png, 
> gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal,  we have the following: 
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
>     long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, 
> boolean updateMetrics)
>  throws IOException {
>  // .....
>   // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
> BBPool (offheap).
>   byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
>   int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
>       onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
> preReadHeaderSize, pread);
>   if (headerBuf != null) {
>         // ...
>   }
>   // ...
>  }
> {code}
> In the read path,  we still read the block from hfile to on-heap byte[], then 
> copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
> 100% get performance test, I also observed some frequent young gc,  The 
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to 
> byte[] for reducing young gc purpose. we did not implement this before, 
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+ 
> has supported this now,  so we can fix this now. I think. 
> Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Comment Edited] (HBASE-21879) Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose

Reply via email to