[GitHub] [hadoop] KevinWikant commented on a change in pull request #3675: HDFS-16303. Improve handling of datanode lost while decommissioning

GitBox Wed, 22 Dec 2021 19:22:56 -0800


KevinWikant commented on a change in pull request #3675:
URL: https://github.com/apache/hadoop/pull/3675#discussion_r774293380




##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java
##########
@@ -151,4 +162,34 @@ public int getPendingNodeCount() {
   public Queue<DatanodeDescriptor> getPendingNodes() {
     return pendingNodes;
   }
+
+  /**
+   * If node "is dead while in Decommission In Progress", it cannot be 
decommissioned
+   * until it becomes healthy again. If there are more pendingNodes than can 
be tracked
+   * & some unhealthy tracked nodes, then re-queue the unhealthy tracked nodes
+   * to avoid blocking decommissioning of healthy nodes.
+   *
+   * @param unhealthyDns The unhealthy datanodes which may be re-queued
+   * @param numDecommissioningNodes The total number of nodes being 
decommissioned
+   * @return List of unhealthy nodes to be re-queued

Review comment:
       corrected List to Stream

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java
##########
@@ -35,12 +38,20 @@
 public abstract class DatanodeAdminMonitorBase
     implements DatanodeAdminMonitorInterface, Configurable {
 
+  /**
+   * Sort by lastUpdate time descending order, such that unhealthy
+   * nodes are de-prioritized given they cannot be decommissioned.
+   */
+  public static final Comparator<DatanodeDescriptor> 
PENDING_NODES_QUEUE_COMPARATOR =

Review comment:
       good call, modified to package private

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminMonitorBase.java
##########
@@ -151,4 +162,34 @@ public int getPendingNodeCount() {
   public Queue<DatanodeDescriptor> getPendingNodes() {
     return pendingNodes;
   }
+
+  /**
+   * If node "is dead while in Decommission In Progress", it cannot be 
decommissioned
+   * until it becomes healthy again. If there are more pendingNodes than can 
be tracked
+   * & some unhealthy tracked nodes, then re-queue the unhealthy tracked nodes
+   * to avoid blocking decommissioning of healthy nodes.
+   *
+   * @param unhealthyDns The unhealthy datanodes which may be re-queued
+   * @param numDecommissioningNodes The total number of nodes being 
decommissioned
+   * @return List of unhealthy nodes to be re-queued
+   */
+  Stream<DatanodeDescriptor> identifyUnhealthyNodesToRequeue(

Review comment:
       modified to "getUnhealthyNodesToRequeue"

##########
File path: 
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/blockmanagement/DatanodeAdminDefaultMonitor.java
##########
@@ -270,12 +273,31 @@ private void check() {
         // an invalid state.
         LOG.warn("DatanodeAdminMonitor caught exception when processing node "
             + "{}.", dn, e);
-        pendingNodes.add(dn);
+        getPendingNodes().add(dn);
         toRemove.add(dn);
       } finally {
         iterkey = dn;
       }
     }
+
+    // Having more nodes decommissioning than can be tracked will impact 
decommissioning
+    // performance due to queueing delay
+    int numTrackedNodes = outOfServiceNodeBlocks.size() - toRemove.size();
+    int numQueuedNodes = getPendingNodes().size();
+    int numDecommissioningNodes = numTrackedNodes + numQueuedNodes;
+    if (numDecommissioningNodes > maxConcurrentTrackedNodes) {
+      LOG.warn(
+          "There are {} nodes decommissioning but only {} nodes will be 
tracked at a time. "
+              + "{} nodes are currently queued waiting to be decommissioned.",
+          numDecommissioningNodes, maxConcurrentTrackedNodes, numQueuedNodes);

Review comment:
       updated




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [hadoop] KevinWikant commented on a change in pull request #3675: HDFS-16303. Improve handling of datanode lost while decommissioning

Reply via email to