Zheng Hu created HBASE-21879:
--------------------------------

             Summary: Read HFile's block to ByteBuffer directly instead of to 
byte for reducing young gc purpose
                 Key: HBASE-21879
                 URL: https://issues.apache.org/jira/browse/HBASE-21879
             Project: HBase
          Issue Type: Improvement
            Reporter: Zheng Hu
            Assignee: Zheng Hu
             Fix For: 3.0.0, 2.2.0, 2.3.0, 2.1.4


In HFileBlock#readBlockDataInternal,  we have the following: 
{code}
@VisibleForTesting
protected HFileBlock readBlockDataInternal(FSDataInputStream is, long offset,
    long onDiskSizeWithHeaderL, boolean pread, boolean verifyChecksum, boolean 
updateMetrics)
 throws IOException {
 // .....
  // TODO: Make this ByteBuffer-based. Will make it easier to go to HDFS with 
BBPool (offheap).
  byte [] onDiskBlock = new byte[onDiskSizeWithHeader + hdrSize];
  int nextBlockOnDiskSize = readAtOffset(is, onDiskBlock, preReadHeaderSize,
      onDiskSizeWithHeader - preReadHeaderSize, true, offset + 
preReadHeaderSize, pread);
  if (headerBuf != null) {
        // ...
  }
  // ...
 }
{code}

In the read path,  we still read the block from hfile to on-heap byte[], then 
copy the on-heap byte[] to offheap bucket cache asynchronously,  and in my  
100% get performance test, I also observed some frequent young gc,  The largest 
memory footprint in the young gen should be the on-heap block byte[].

In fact, we can read HFile's block to ByteBuffer directly instead of to byte[] 
for reducing young gc purpose. we did not implement this before, because no 
ByteBuffer reading interface in the older HDFS client, but 2.7+ has supported 
this now,  so we can fix this now. I think. 

Will provide an patch and some perf-comparison for this. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to