ivandika3 commented on code in PR #9926:
URL: https://github.com/apache/ozone/pull/9926#discussion_r2945845042
##########
hadoop-hdds/server-scm/src/main/java/org/apache/hadoop/hdds/scm/node/DeadNodeHandler.java:
##########
@@ -119,15 +119,24 @@ public void onMessage(final DatanodeDetails
datanodeDetails,
deletedBlockLog.onDatanodeDead(datanodeDetails.getID());
}
- //move dead datanode out of ClusterNetworkTopology
- NetworkTopology nt = nodeManager.getClusterNetworkTopologyMap();
- if (nt.contains(datanodeDetails)) {
- nt.remove(datanodeDetails);
- //make sure after DN is removed from topology,
- //DatanodeDetails instance returned from nodeStateManager has no
parent.
- Preconditions.checkState(
- nodeManager.getNode(datanodeDetails.getID())
- .getParent() == null);
+ // Only remove from topology if the node is still DEAD. Between the time
+ // the DEAD_NODE event was fired and now, the node may have been
+ // resurrected (DEAD -> HEALTHY_READONLY) via a heartbeat. Removing a
+ // resurrected node from the topology would leave it reachable but
+ // invisible to the placement policy.
+ NodeStatus currentStatus =
+ nodeManager.getNodeStatus(datanodeDetails);
+ if (currentStatus.getHealth() == HddsProtos.NodeState.DEAD) {
Review Comment:
Good point, I added another check at the start. Technically, we need to do
each check before doing any of these actions, but seems to be overkill.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]