Yiqun Lin created HDFS-15486:
--------------------------------
Summary: Costly sendResponse operation slows down async editlog
handling
Key: HDFS-15486
URL: https://issues.apache.org/jira/browse/HDFS-15486
Project: Hadoop HDFS
Issue Type: Bug
Affects Versions: 2.7.0
Reporter: Yiqun Lin
Attachments: Async-profile-(2).jpg, async-profile-(1).jpg
When our cluster NameNode in a very high load, we find it often stuck in
Async-editlog handling.
We use async-profile tool to get the flamegraph.
!Async-profile-(2).jpg!
This happened in that async editlog thread consumes Edit from the queue and
triggers the sendResponse call.
But here the sendResponse call is a little expensive since our cluster enabled
the security env and will do some encode operations when doing the return
response operation.
We often catch some moments of costly sendResponse operation when rpc call
queue is fulled.
!async-profile-(1).jpg!
Slowness on consuming Edit in async editlog will make Edit pending Queue in the
fulled state, then block its enqueue operation that is invoked in writeLock
type methods in FSNamesystem class.
Here the enhancement is that we can use multiple thread to parallel execute
sendResponse call. sendResponse doesn't need use the write lock to do
protection, so this change is safe.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]