[
https://issues.apache.org/jira/browse/HDFS-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17423081#comment-17423081
]
Yiqun Lin commented on HDFS-15486:
----------------------------------
Some notes for above draft patch:
* Here we introduced the switch setting to enable the async response handling.
* The patch is based on the branch-2.7 branch not the latest trunk branch.
> Costly sendResponse operation slows down async editlog handling
> ---------------------------------------------------------------
>
> Key: HDFS-15486
> URL: https://issues.apache.org/jira/browse/HDFS-15486
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.7.0
> Reporter: Yiqun Lin
> Priority: Major
> Attachments: Async-profile-(2).jpg, HDFS-15486_draft.patch,
> async-profile-(1).jpg
>
>
> When our cluster NameNode in a very high load, we find it often stuck in
> Async-editlog handling.
> We use async-profile tool to get the flamegraph.
> !Async-profile-(2).jpg!
> This happened in that async editlog thread consumes Edit from the queue and
> triggers the sendResponse call.
> But here the sendResponse call is a little expensive since our cluster
> enabled the security env and will do some encode operations when doing the
> return response operation.
> We often catch some moments of costly sendResponse operation when rpc call
> queue is fulled.
> !async-profile-(1).jpg!
> Slowness on consuming Edit in async editlog will make Edit pending Queue
> easily become the fulled state, then block its enqueue operation that is
> invoked in writeLock type methods in FSNamesystem class.
> Here the enhancement is that we can use multiple thread to parallel execute
> sendResponse call. sendResponse doesn't need use the write lock to do
> protection, so this change is safe.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]