[jira] [Commented] (HDFS-17090) Decommission will be stuck for long time when restart because overlapped process Register and BlockReport.

Xiaoqiao He (Jira) Mon, 17 Jul 2023 05:41:06 -0700


    [ 
https://issues.apache.org/jira/browse/HDFS-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743790#comment-17743790
 ]


Xiaoqiao He commented on HDFS-17090:
------------------------------------

We mark `DatanodeProtocol#registerDatanode` interface as 'Idempotent' now. 
However the implement of registerDataNode is not idempotent actually, such as 
`blockReportCount` will be reset to 0 always, which will affect other logic. So 
maybe another way is that mark #registerDataNode `AtMostOnce` and use 
RetryCache to cover this corner case.

> Decommission will be stuck for long time when restart because overlapped 
> process Register and BlockReport.
> ----------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-17090
>                 URL: https://issues.apache.org/jira/browse/HDFS-17090
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: namenode
>            Reporter: Xiaoqiao He
>            Assignee: Xiaoqiao He
>            Priority: Major
>
> I met one corner case recently, which decommission DataNode impact 
> performance of NameNode. After dig carefully, I have reproduced this case.
> a. Add some DataNodes to exclude and prepare to decommission this Datanodes.
> b. Execute bin/hdfs dfsadmin -refresh (This is optional step).
> c. Restart NameNode for upgrade or other reason before complete to 
> decommission.
> d. All DataNodes will be trigger to register and FBR.
> e. Considering that the load of NameNode will be very high, especially 8040 
> CallQueue will be full for a long time because RPC flood about 
> register/heartbeat/FBR from DataNodes.
> f. For one decommission in-progress node, it will not complete to 
> decommission until next FBR even all replicas of this node has been 
> processed, because the request order register-heartbeat-(blockreport, 
> register), and the second register could be one retry RPC request from 
> DataNode (No more log information from DataNode to confirm), and for 
> (blockreport, register), NameNode could process one storage then process 
> register then process remaining storages in order. 
> g. Because the second register RPC, the related DataNodes will be marked 
> unhealthy by BlockManager#isNodeHealthyForDecommissionOrMaintenance. So 
> decommission will be stuck for long time until next FBR. Thus NameNode need 
> to scan this DataNode at every round to check if could complete which hold 
> the global write lock and impact performance of NameNode.
> To improve it, I think we could filter the repeated register RPC request at 
> startup progress. Not think carefully if it will involve other risks when 
> filter register directly. Welcome anymore discussions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HDFS-17090) Decommission will be stuck for long time when restart because overlapped process Register and BlockReport.

Reply via email to