huangzhaobo99 opened a new pull request, #6393:
URL: https://github.com/apache/hadoop/pull/6393

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   JIRA: https://issues.apache.org/jira/browse/HDFS-17305
   
   Now, there are slownode and load avoidance functions, mainly implemented in 
the  BlockPlacementPolicyDefault class.
   
   1. After triggering the exclusion condition, some logs will be printed on 
nn, which can be used to troubleshoot anomalies in nn by checking the logs, the 
code is as follows:
   ```java
   ...
   if (!node.isInService()) {
     logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
     return false;
   }
   
   if (avoidStaleNodes) {
     if (node.isStale(this.staleInterval)) {
       logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
       return false;
     }
   }
   ...
   ```
   2. If the exclusion condition is triggered, we can record it through metrics 
and count the total number of exclusions.
   
   3. These metrics through prometheus+grafana to observe the current situation 
of the cluster when selecting datanodes.
   
   
   ### How was this patch tested?
    Add TestNameNodeMetrics#testAvoidTargetDataNodeMetrics UnitTest.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to