huangzhaobo99 opened a new pull request, #6393:
URL: https://github.com/apache/hadoop/pull/6393
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
JIRA: https://issues.apache.org/jira/browse/HDFS-17305
Now, there are slownode and load avoidance functions, mainly implemented in
the BlockPlacementPolicyDefault class.
1. After triggering the exclusion condition, some logs will be printed on
nn, which can be used to troubleshoot anomalies in nn by checking the logs, the
code is as follows:
```java
...
if (!node.isInService()) {
logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
return false;
}
if (avoidStaleNodes) {
if (node.isStale(this.staleInterval)) {
logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
return false;
}
}
...
```
2. If the exclusion condition is triggered, we can record it through metrics
and count the total number of exclusions.
3. These metrics through prometheus+grafana to observe the current situation
of the cluster when selecting datanodes.
### How was this patch tested?
Add TestNameNodeMetrics#testAvoidTargetDataNodeMetrics UnitTest.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]