[ 
https://issues.apache.org/jira/browse/HDFS-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16185888#comment-16185888
 ] 

Kihwal Lee commented on HDFS-7964:
----------------------------------

bq. How many handler threads are used for the 285k QPS?
100. More handlers are not necessarily better. More handlers increase lock 
contention and the typical parallelism in NN is not very high anyway. Look at 
how many cores a super busy NN can utilize. It's not very impressive. So in 
general increasing number of handlers too much is not a good idea.  

One of the  exceptions would be the handlers serving  long 
{{getContentSummary}} calls.  If called against a big tree, it can occupy a 
handler for many seconds.  If you have too many of these, the utilization and 
throughput will drop.  Some say it was a mistake to add {{getContentSummary}} 
in the first place, but it's already there, so we need to deal with it.  If it 
becomes a major issue, we can think about async handling them and freeing up 
handlers, ala async IBR processing. But the response here will be variable, so 
it gets a bit more complicated.

If we improve the parallelism in NN, more handlers will give us better 
throughput. The fine-grained locking was one of the efforts put in to achieve 
this, but the introduction of the snapshot  feature made it very complicated, 
if not impossible.  The performance of the prototype was impressive.

> Add support for async edit logging
> ----------------------------------
>
>                 Key: HDFS-7964
>                 URL: https://issues.apache.org/jira/browse/HDFS-7964
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>    Affects Versions: 2.0.2-alpha
>            Reporter: Daryn Sharp
>            Assignee: Daryn Sharp
>             Fix For: 2.8.0, 2.9.0, 3.0.0-alpha1
>
>         Attachments: HDFS-7964-branch-2.7.patch, 
> HDFS-7964-branch-2.8.0.patch, HDFS-7964.patch, HDFS-7964.patch, 
> HDFS-7964.patch, HDFS-7964.patch, HDFS-7964-rebase.patch
>
>
> Edit logging is a major source of contention within the NN.  LogEdit is 
> called within the namespace write log, while logSync is called outside of the 
> lock to allow greater concurrency.  The handler thread remains busy until 
> logSync returns to provide the client with a durability guarantee for the 
> response.
> Write heavy RPC load and/or slow IO causes handlers to stall in logSync.  
> Although the write lock is not held, readers are limited/starved and the call 
> queue fills.  Combining an edit log thread with postponed RPC responses from 
> HADOOP-10300 will provide the same durability guarantee but immediately free 
> up the handlers.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to