Zheng Hu created HBASE-21879:
--------------------------------
Summary: Read HFile's block to ByteBuffer directly instead of to
byte for reducing young gc purpose
Key: HBASE-21879
URL: https://issues.apache.org/jira/browse/HBASE-21879
Project: HBase
Issue Type: Improvement
Reporter: Zheng Hu
Assignee: Zheng Hu
Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
In HFileBlock#readBlockDataInternal, we have the following:
{code}
@VisibleForTesting
protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean
updateMetrics)
throws IOException {
// .....
// TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with
BBPool (offheap).
byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
onDiskSizeWithHeader - preReadHeaderSize, true, offset +
preReadHeaderSize, pread);
if (headerBuf != null) {
// ...
}
// ...
}
{code}
In the read path, we still read the block from hfile to on-heap byte[], then
copy the on-heap byte[] to offheap bucket cache asynchronously, and in my
100% get performance test, I also observed some frequent young gc, The largest
memory footprint in the young gen should be the on-heap block byte[].
In fact, we can read HFile's block to ByteBuffer directly instead of to byte[]
for reducing young gc purpose. we did not implement this before, because no
ByteBuffer reading interface in the older HDFS client, but 2.7+ has supported
this now, so we can fix this now. I think.
Will provide an patch and some perf-comparison for this.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)