[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

Wei-Chiu Chuang (Jira) Mon, 04 May 2020 17:17:20 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-13183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099434#comment-17099434
 ]


Wei-Chiu Chuang commented on HDFS-13183:
----------------------------------------

I am really sorry I meant to review but got distracted.

I would like to push this feature to the finish line, because CRFS is a big 
feature and will take time to stabilize. Plus, it requires an additional 
Observer NameNode. The logistics of adding an extra master namenode adds 
additional complexity.

A few comments on the patch:
* does it work in federated cluster? IIRC you have a large federated cluster so 
I am assuming the answer is yes, but does work out of box or does it require 
extra configuration ? (Sorry, don't have much experience with HDFS federation)
* Looks like the balancer determine which NN is the sbnn at start, and then use 
it til the end. There are two issues:
** failover. if a failover happens, the balancer can't adapt and will then send 
the requests to ANN. That is fine as it shouldn't fail the balancer, but it 
increases the new ANN overhead.
** multiple standby namenode support. The balancer always choose the first 
available standby namenode. This is fine, since in any case there can be only 
one balancer running at a time.

Also, just want to say that you don't actually need to UNCHECKED 
FSNamesystem#getBlocks(). If dfs.ha.allow.stale.reads is true, Standby NN 
accepts the request as well. That is an extra configuration so probably not 
ideal.

> Standby NameNode process getBlocks request to reduce Active load
> ----------------------------------------------------------------
>
>                 Key: HDFS-13183
>                 URL: https://issues.apache.org/jira/browse/HDFS-13183
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: balancer &amp; mover, namenode
>            Reporter: Xiaoqiao He
>            Assignee: Xiaoqiao He
>            Priority: Major
>         Attachments: HDFS-13183-trunk.001.patch, HDFS-13183-trunk.002.patch, 
> HDFS-13183-trunk.003.patch, HDFS-13183.004.patch, HDFS-13183.005.patch
>
>
> The performance of Active NameNode could be impact when {{Balancer}} requests 
> #getBlocks, since query blocks of overly full DNs performance is extremely 
> inefficient currently. The main reason is {{NameNodeRpcServer#getBlocks}} 
> hold read lock for long time. In extreme case, all handlers of Active 
> NameNode RPC server are occupied by one reader 
> {{NameNodeRpcServer#getBlocks}} and other write operation calls, thus Active 
> NameNode enter a state of false death for number of seconds even for minutes.
> The similar performance concerns of Balancer have reported by HDFS-9412, 
> HDFS-7967, etc.
> If Standby NameNode can shoulder #getBlocks heavy burden, it could speed up 
> the progress of balancing and reduce performance impact to Active NameNode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-13183) Standby NameNode process getBlocks request to reduce Active load

Reply via email to