[ 
https://issues.apache.org/jira/browse/HDFS-15486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin updated HDFS-15486:
-----------------------------
    Description: 
When our cluster NameNode in a very high load, we find it often stuck in 
Async-editlog handling.

We use async-profile tool to get the flamegraph.

!Async-profile-(2).jpg!

This happened in that async editlog thread consumes Edit from the queue and 
triggers the sendResponse call.

But here the sendResponse call is a little expensive since our cluster enabled 
the security env and will do some encode operations when doing the return 
response operation.

We often catch some moments of costly sendResponse operation when rpc call 
queue is fulled.

!async-profile-(1).jpg!

Slowness on consuming Edit in async editlog will make Edit pending Queue easily 
become the fulled state, then block its enqueue operation that is invoked in 
writeLock type methods in FSNamesystem class.

Here the enhancement is that we can use multiple thread to parallel execute 
sendResponse call. sendResponse doesn't need use the write lock to do 
protection, so this change is safe.

  was:
When our cluster NameNode in a very high load, we find it often stuck in 
Async-editlog handling.

We use async-profile tool to get the flamegraph.

!Async-profile-(2).jpg!

This happened in that async editlog thread consumes Edit from the queue and 
triggers the sendResponse call.

But here the sendResponse call is a little expensive since our cluster enabled 
the security env and will do some encode operations when doing the return 
response operation.

We often catch some moments of costly sendResponse operation when rpc call 
queue is fulled.

!async-profile-(1).jpg!

Slowness on consuming Edit in async editlog will make Edit pending Queue in the 
fulled state, then block its enqueue operation that is invoked in writeLock 
type methods in FSNamesystem class.

Here the enhancement is that we can use multiple thread to parallel execute 
sendResponse call. sendResponse doesn't need use the write lock to do 
protection, so this change is safe.


> Costly sendResponse operation slows down async editlog handling
> ---------------------------------------------------------------
>
>                 Key: HDFS-15486
>                 URL: https://issues.apache.org/jira/browse/HDFS-15486
>             Project: Hadoop HDFS
>          Issue Type: Bug
>    Affects Versions: 2.7.0
>            Reporter: Yiqun Lin
>            Priority: Major
>         Attachments: Async-profile-(2).jpg, async-profile-(1).jpg
>
>
> When our cluster NameNode in a very high load, we find it often stuck in 
> Async-editlog handling.
> We use async-profile tool to get the flamegraph.
> !Async-profile-(2).jpg!
> This happened in that async editlog thread consumes Edit from the queue and 
> triggers the sendResponse call.
> But here the sendResponse call is a little expensive since our cluster 
> enabled the security env and will do some encode operations when doing the 
> return response operation.
> We often catch some moments of costly sendResponse operation when rpc call 
> queue is fulled.
> !async-profile-(1).jpg!
> Slowness on consuming Edit in async editlog will make Edit pending Queue 
> easily become the fulled state, then block its enqueue operation that is 
> invoked in writeLock type methods in FSNamesystem class.
> Here the enhancement is that we can use multiple thread to parallel execute 
> sendResponse call. sendResponse doesn't need use the write lock to do 
> protection, so this change is safe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to