[
https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855018#comment-16855018
]
Todd Lipcon commented on HDFS-14535:
------------------------------------
+1, Seems like an improvement to me. One thing I wonder, though -- if you're
missing the "open file" often enough for this to make a difference, is the
short-circuit feature actually helpful for the workload? Maybe you would be
better off disabling it entirely.
I also noticed in the flame graph a couple of other suspicious items like
Pattern.compile in DomainSocket.getEffectivePath(). That's not a heavy memory
allocator but certainly seems like an easy thing to optimize out for some CPU
win (in another patch).
Mind sending this as a PR for the apache/hadoop repo on github? It's easier for
me to merge via the github UI than manually from a patch on JIRA, though maybe
Wei Chiu has an environment ready to commit without it.
> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is
> causing lots of heap allocation in HBase when using short-circut read
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14535
> URL: https://issues.apache.org/jira/browse/HDFS-14535
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Attachments: HDFS-14535.patch
>
>
> Our HBase team are trying to read the blocks from HDFS into pooled offheap
> ByteBuffers directly (HBASE-21879), and recently we had some benchmark,
> found that almost 45% heap allocation from the DFS client. The heap
> allocation flame graph can be see here:
> https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path, we found that when requesting file descriptors
> from a DomainPeer, we allocated huge 8KB buffer for BufferedOutputStream,
> though the protocal content was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%, which
> increased the HBase P999 latency. Actually, we can pre-allocate a small
> buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read
> the short-circuit fd protocal content. we've created a patch like that, and
> the allocation flame graph show that after the patch, the heap allocation
> from DFS client dropped from 45% to 27%, that's a very good thing I think.
> see:
> https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,
> HBase will benifit a lot from this.
> Thanks.
> For more details, can see here:
> https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]