Zheng Hu created HBASE-21879: -------------------------------- Summary: Read HFile's block to ByteBuffer directly instead of to byte for reducing young gc purpose Key: HBASE-21879 URL: https://issues.apache.org/jira/browse/HBASE-21879 Project: HBase Issue Type: Improvement Reporter: Zheng Hu Assignee: Zheng Hu Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4
In HFileBlock#readBlockDataInternal, we have the following: {code} @VisibleForTesting protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset, long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean updateMetrics) throws IOException { // ..... // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with BBPool (offheap). byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize]; int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize, onDiskSizeWithHeader - preReadHeaderSize, true, offset + preReadHeaderSize, pread); if (headerBuf != null) { // ... } // ... } {code} In the read path, we still read the block from hfile to on-heap byte[], then copy the on-heap byte[] to offheap bucket cache asynchronously, and in my 100% get performance test, I also observed some frequent young gc, The largest memory footprint in the young gen should be the on-heap block byte[]. In fact, we can read HFile's block to ByteBuffer directly instead of to byte[] for reducing young gc purpose. we did not implement this before, because no ByteBuffer reading interface in the older HDFS client, but 2.7+ has supported this now, so we can fix this now. I think. Will provide an patch and some perf-comparison for this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)