KevinWikant commented on a change in pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#discussion_r774293380
##########
File path:
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java
##########
@@ -151,4 +162,34 @@ public int getPendingNodeCount() {
public Queue<DatanodeDescriptor> getPendingNodes() {
return pendingNodes;
}
+
+ /**
+ * If node "is dead while in Decommission In Progress", it cannot be
decommissioned
+ * until it becomes healthy again. If there are more pendingNodes than can
be tracked
+ * & some unhealthy tracked nodes, then re-queue the unhealthy tracked nodes
+ * to avoid blocking decommissioning of healthy nodes.
+ *
+ * @param unhealthyDns The unhealthy datanodes which may be re-queued
+ * @param numDecommissioningNodes The total number of nodes being
decommissioned
+ * @return List of unhealthy nodes to be re-queued
Review comment:
corrected List to Stream
##########
File path:
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java
##########
@@ -35,12 +38,20 @@
public abstract class DatanodeAdminMonitorBase
implements DatanodeAdminMonitorInterface, Configurable {
+ /**
+ * Sort by lastUpdate time descending order, such that unhealthy
+ * nodes are de-prioritized given they cannot be decommissioned.
+ */
+ public static final Comparator<DatanodeDescriptor>
PENDING_NODES_QUEUE_COMPARATOR =
Review comment:
good call, modified to package private
##########
File path:
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java
##########
@@ -151,4 +162,34 @@ public int getPendingNodeCount() {
public Queue<DatanodeDescriptor> getPendingNodes() {
return pendingNodes;
}
+
+ /**
+ * If node "is dead while in Decommission In Progress", it cannot be
decommissioned
+ * until it becomes healthy again. If there are more pendingNodes than can
be tracked
+ * & some unhealthy tracked nodes, then re-queue the unhealthy tracked nodes
+ * to avoid blocking decommissioning of healthy nodes.
+ *
+ * @param unhealthyDns The unhealthy datanodes which may be re-queued
+ * @param numDecommissioningNodes The total number of nodes being
decommissioned
+ * @return List of unhealthy nodes to be re-queued
+ */
+ Stream<DatanodeDescriptor> identifyUnhealthyNodesToRequeue(
Review comment:
modified to "getUnhealthyNodesToRequeue"
##########
File path:
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java
##########
@@ -270,12 +273,31 @@ private void check() {
// an invalid state.
LOG.warn("DatanodeAdminMonitor caught exception when processing node "
+ "{}.", dn, e);
- pendingNodes.add(dn);
+ getPendingNodes().add(dn);
toRemove.add(dn);
} finally {
iterkey = dn;
}
}
+
+ // Having more nodes decommissioning than can be tracked will impact
decommissioning
+ // performance due to queueing delay
+ int numTrackedNodes = outOfServiceNodeBlocks.size() - toRemove.size();
+ int numQueuedNodes = getPendingNodes().size();
+ int numDecommissioningNodes = numTrackedNodes + numQueuedNodes;
+ if (numDecommissioningNodes > maxConcurrentTrackedNodes) {
+ LOG.warn(
+ "There are {} nodes decommissioning but only {} nodes will be
tracked at a time. "
+ + "{} nodes are currently queued waiting to be decommissioned.",
+ numDecommissioningNodes, maxConcurrentTrackedNodes, numQueuedNodes);
Review comment:
updated
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]