[
https://issues.apache.org/jira/browse/HDFS-14535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16855258#comment-16855258
]
Zheng Hu commented on HDFS-14535:
---------------------------------
bq. if you're missing the "open file" often enough for this to make a
difference, is the short-circuit feature actually helpful for the workload?
Maybe you would be better off disabling it entirely.
Emm...IIRC, seems our HBase code have a bug here, for Get operation, all of
the query should share the the same reader, which means shouldn't request the
short-circuit fd so frequently. Any way I can check the HBase code again, and
the HDFS PR can still be merged ino trunk.
bq. I also noticed in the flame graph a couple of other suspicious items like
Pattern.compile in DomainSocket.getEffectivePath(). That's not a heavy memory
allocator but certainly seems like an easy thing to optimize out for some CPU
win (in another patch).
That's true, will PR for this if have any time.
bq. Mind sending this as a PR for the apache/hadoop repo on github?
I've created the attached PR, you can see that.
Thanks.
> The default 8KB buffer in requestFileDescriptors#BufferedOutputStream is
> causing lots of heap allocation in HBase when using short-circut read
> ----------------------------------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-14535
> URL: https://issues.apache.org/jira/browse/HDFS-14535
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Zheng Hu
> Assignee: Zheng Hu
> Priority: Major
> Attachments: HDFS-14535.patch
>
>
> Our HBase team are trying to read the blocks from HDFS into pooled offheap
> ByteBuffers directly (HBASE-21879), and recently we had some benchmark,
> found that almost 45% heap allocation from the DFS client. The heap
> allocation flame graph can be see here:
> https://issues.apache.org/jira/secure/attachment/12970295/async-prof-pid-25042-alloc-2.svg
> After checking the code path, we found that when requesting file descriptors
> from a DomainPeer, we allocated huge 8KB buffer for BufferedOutputStream,
> though the protocal content was quite small and just few bytes.
> It made a heavy GC pressure for HBase when cacheHitRatio < 60%, which
> increased the HBase P999 latency. Actually, we can pre-allocate a small
> buffer for the BufferedOutputStream, such as 512 bytes, it's enough to read
> the short-circuit fd protocal content. we've created a patch like that, and
> the allocation flame graph show that after the patch, the heap allocation
> from DFS client dropped from 45% to 27%, that's a very good thing I think.
> see:
> https://issues.apache.org/jira/secure/attachment/12970475/async-prof-pid-24534-alloc-2.svg
> Hope this attached patch can be merged into HDFS trunk, also Hadoop-2.8.x,
> HBase will benifit a lot from this.
> Thanks.
> For more details, can see here:
> https://issues.apache.org/jira/browse/HBASE-22387?focusedCommentId=16851639&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16851639
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]