[ 
https://issues.apache.org/jira/browse/HDFS-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801221#comment-17801221
 ] 

ASF GitHub Bot commented on HDFS-17305:
---------------------------------------

huangzhaobo99 opened a new pull request, #6393:
URL: https://github.com/apache/hadoop/pull/6393

   <!--
     Thanks for sending a pull request!
       1. If this is your first time, please read our contributor guidelines: 
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
       2. Make sure your PR title starts with JIRA issue id, e.g., 
'HADOOP-17799. Your PR title ...'.
   -->
   
   ### Description of PR
   JIRA: https://issues.apache.org/jira/browse/HDFS-17305
   
   Now, there are slownode and load avoidance functions, mainly implemented in 
the  BlockPlacementPolicyDefault class.
   
   1. After triggering the exclusion condition, some logs will be printed on 
nn, which can be used to troubleshoot anomalies in nn by checking the logs, the 
code is as follows:
   ```java
   ...
   if (!node.isInService()) {
     logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
     return false;
   }
   
   if (avoidStaleNodes) {
     if (node.isStale(this.staleInterval)) {
       logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
       return false;
     }
   }
   ...
   ```
   2. If the exclusion condition is triggered, we can record it through metrics 
and count the total number of exclusions.
   
   3. These metrics through prometheus+grafana to observe the current situation 
of the cluster when selecting datanodes.
   
   
   ### How was this patch tested?
    Add TestNameNodeMetrics#testAvoidTargetDataNodeMetrics UnitTest.




> Add avoid datanode reason count related metrics to namenode.
> ------------------------------------------------------------
>
>                 Key: HDFS-17305
>                 URL: https://issues.apache.org/jira/browse/HDFS-17305
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: huangzhaobo99
>            Assignee: huangzhaobo99
>            Priority: Minor
>
> Now, there are slownode and load avoidance functions, mainly implemented in 
> theĀ  BlockPlacementPolicyDefault class.
> 1. After triggering the exclusion condition, some logs will be printed on nn, 
> which can be used to troubleshoot anomalies in nn by checking the logs, the 
> code is as follows:
> {code:java}
> ...
> if (!node.isInService()) {
>   logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
>   return false;
> }
> if (avoidStaleNodes) {
>   if (node.isStale(this.staleInterval)) {
>     logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
>     return false;
>   }
> }
> ...{code}
> 2. If the exclusion condition is triggered, we can record it through metrics 
> and count the total number of exclusions.
> 3. These metrics through prometheus+grafana to observe the current situation 
> of the cluster when selecting datanodes.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to