[
https://issues.apache.org/jira/browse/HBASE-21879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766838#comment-16766838
]
Anoop Sam John commented on HBASE-21879:
----------------------------------------
This direct copy is possible iff it is an off heap DRAM bucket cache right?
For other types of IOEngine, still the same path has to be there. Need to see
how complicated the code will be then. Or should we try use the BB pool (off
heap) available in RS? We use this pool at the RPC layer now. If we have
buffers available, make use of them? Then the Q is what if the size of the
block > one pooled buffer. Use multi buffers? Am just trying to give diff
options here.. Agree that we can try to solve this young GC issue.
> Read HFile's block to ByteBuffer directly instead of to byte for reducing
> young gc purpose
> ------------------------------------------------------------------------------------------
>
> Key: HBASE-21879
> URL: https://issues.apache.org/jira/browse/HBASE-21879
> Project: HBase
> Issue Type: Improvement
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
>
> Attachments: QPS-latencies-before-HBASE-21879.png,
> gc-data-before-HBASE-21879.png
>
>
> In HFileBlock#readBlockDataInternal, we have the following:
> {code}
> @VisibleForTesting
> protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
> long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum,
> boolean updateMetrics)
> throws IOException {
> // .....
> // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with
> BBPool (offheap).
> byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
> int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
> onDiskSizeWithHeader - preReadHeaderSize, true, offset +
> preReadHeaderSize, pread);
> if (headerBuf != null) {
> // ...
> }
> // ...
> }
> {code}
> In the read path, we still read the block from hfile to on-heap byte[], then
> copy the on-heap byte[] to offheap bucket cache asynchronously, and in my
> 100% get performance test, I also observed some frequent young gc, The
> largest memory footprint in the young gen should be the on-heap block byte[].
> In fact, we can read HFile's block to ByteBuffer directly instead of to
> byte[] for reducing young gc purpose. we did not implement this before,
> because no ByteBuffer reading interface in the older HDFS client, but 2.7+
> has supported this now, so we can fix this now. I think.
> Will provide an patch and some perf-comparison for this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)