[
https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Zheng Hu updated HDFS-14535:
----------------------------
Description:
Our HBase team are trying to read the blocks from HDFS into pooled offheap
ByteBuffers directly (HBASE-21879), and recently we had some benchmark, found
that almost 45% heap allocation from the DFS client. The heap allocation
flame graph can be see here:
https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
After checking the code path, we found that when requesting file descriptors
from a DomainPeer, we allocated huge 8KB buffer for BufferedOutputStream,
though the protocal content was quite small and just few bytes.
It made a heavy GC pressure for HBase when cacheHitRatio < 60%, which
increased the HBase P999 latency. Actually, we can pre-allocate a small
buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read the
short-circuit fd protocal content. we've created a patch like that, and the
allocation flame graph show that after the patch, the heap allocation from DFS
client dropped from 45% to 27%, that's a very good thing I think. see:
https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,
HBase will benifit a lot from this.
Thanks.
was:
Our HBase team are trying to read the blocks from HDFS into pooled offheap
ByteBuffers directly (HBASE-21879), and recently we had some benchmark, found
that almost 45% heap allocation from the DFS client. The heap allocation
flame graph can be see here:
https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
After checking the code path, we found that when requesting file descriptors
from a DomainPeer, we allocated huge 8KB buffer for BufferedOutputStream,
though the protocal content was quite small and just few bytes.
It made a heavy GC pressure for HBase when cacheHitRatio < 60%, which
increased the HBase P999 latency. Actually, we can pre-allocate a small
buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read the
short-circuit fd protocal content. we've created a patch like that, and the
allocation flame graph show that after the patch, the heap allocation from DFS
client dropped from 45% to 27%, that's a very good thing I think.
Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,
HBase will benifit a lot from this.
Thanks.
> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is
> causing lots of heap allocation in HBase when using short-circut read
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14535
> URL: https://issues.apache.org/jira/browse/HDFS-14535
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
>
> Our HBase team are trying to read the blocks from HDFS into pooled offheap
> ByteBuffers directly (HBASE-21879), and recently we had some benchmark,
> found that almost 45% heap allocation from the DFS client. The heap
> allocation flame graph can be see here:
> https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path, we found that when requesting file descriptors
> from a DomainPeer, we allocated huge 8KB buffer for BufferedOutputStream,
> though the protocal content was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%, which
> increased the HBase P999 latency. Actually, we can pre-allocate a small
> buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read
> the short-circuit fd protocal content. we've created a patch like that, and
> the allocation flame graph show that after the patch, the heap allocation
> from DFS client dropped from 45% to 27%, that's a very good thing I think.
> see:
> https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,
> HBase will benifit a lot from this.
> Thanks.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]