[ 
https://issues.apache.org/jira/browse/HDFS-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

huangzhaobo99 updated HDFS-17305:
---------------------------------
    Description: 
Now, there are slownode and load avoidance functions, mainly implemented in the 
 BlockPlacementPolicyDefault class.

1. After triggering the exclusion condition, some logs will be printed on nn, 
which can be used to troubleshoot anomalies in nn by checking the logs, the 
code is as follows:
{code:java}
...
if (!node.isInService()) {
  logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
  return false;
}

if (avoidStaleNodes) {
  if (node.isStale(this.staleInterval)) {
    logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
    return false;
  }
}
...{code}
2. If the exclusion condition is triggered, we can record it through metrics 
and count the total number of exclusions.

  was:
Now, there are slownode and load avoidance functions, mainly implemented in the 
 BlockPlacementPolicyDefault class.

1. After triggering the exclusion condition, some logs will be printed on nn, 
which can be used to troubleshoot anomalies in nn by checking the logs, the 
code is as follows:
{code:java}
...
if (!node.isInService()) {
  logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
  return false;
}

if (avoidStaleNodes) {
  if (node.isStale(this.staleInterval)) {
    logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
    return false;
  }
}
...{code}
2. If the exclusion condition is triggered, we can record it through metrics 
and count the total number of exclusions.

3. These metrics through prometheus+grafana to observe the current situation of 
the cluster when selecting datanodes.


> Add avoid datanode reason count related metrics to namenode.
> ------------------------------------------------------------
>
>                 Key: HDFS-17305
>                 URL: https://issues.apache.org/jira/browse/HDFS-17305
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: huangzhaobo99
>            Assignee: huangzhaobo99
>            Priority: Minor
>              Labels: pull-request-available
>
> Now, there are slownode and load avoidance functions, mainly implemented in 
> the  BlockPlacementPolicyDefault class.
> 1. After triggering the exclusion condition, some logs will be printed on nn, 
> which can be used to troubleshoot anomalies in nn by checking the logs, the 
> code is as follows:
> {code:java}
> ...
> if (!node.isInService()) {
>   logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
>   return false;
> }
> if (avoidStaleNodes) {
>   if (node.isStale(this.staleInterval)) {
>     logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
>     return false;
>   }
> }
> ...{code}
> 2. If the exclusion condition is triggered, we can record it through metrics 
> and count the total number of exclusions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to