[
https://issues.apache.org/jira/browse/HDFS-17090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17743790#comment-17743790
]
Xiaoqiao He commented on HDFS-17090:
------------------------------------
We mark `DatanodeProtocol#registerDatanode` interface as 'Idempotent' now.
However the implement of registerDataNode is not idempotent actually, such as
`blockReportCount` will be reset to 0 always, which will affect other logic. So
maybe another way is that mark #registerDataNode `AtMostOnce` and use
RetryCache to cover this corner case.
> Decommission will be stuck for long time when restart because overlapped
> process Register and BlockReport.
> ----------------------------------------------------------------------------------------------------------
>
> Key: HDFS-17090
> URL: https://issues.apache.org/jira/browse/HDFS-17090
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Reporter: Xiaoqiao He
> Assignee: Xiaoqiao He
> Priority: Major
>
> I met one corner case recently, which decommission DataNode impact
> performance of NameNode. After dig carefully, I have reproduced this case.
> a. Add some DataNodes to exclude and prepare to decommission this Datanodes.
> b. Execute bin/hdfs dfsadmin -refresh (This is optional step).
> c. Restart NameNode for upgrade or other reason before complete to
> decommission.
> d. All DataNodes will be trigger to register and FBR.
> e. Considering that the load of NameNode will be very high, especially 8040
> CallQueue will be full for a long time because RPC flood about
> register/heartbeat/FBR from DataNodes.
> f. For one decommission in-progress node, it will not complete to
> decommission until next FBR even all replicas of this node has been
> processed, because the request order register-heartbeat-(blockreport,
> register), and the second register could be one retry RPC request from
> DataNode (No more log information from DataNode to confirm), and for
> (blockreport, register), NameNode could process one storage then process
> register then process remaining storages in order.
> g. Because the second register RPC, the related DataNodes will be marked
> unhealthy by BlockManager#isNodeHealthyForDecommissionOrMaintenance. So
> decommission will be stuck for long time until next FBR. Thus NameNode need
> to scan this DataNode at every round to check if could complete which hold
> the global write lock and impact performance of NameNode.
> To improve it, I think we could filter the repeated register RPC request at
> startup progress. Not think carefully if it will involve other risks when
> filter register directly. Welcome anymore discussions.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]