[
https://issues.apache.org/jira/browse/HDFS-17305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17801221#comment-17801221
]
ASF GitHub Bot commented on HDFS-17305:
---------------------------------------
huangzhaobo99 opened a new pull request, #6393:
URL: https://github.com/apache/hadoop/pull/6393
<!--
Thanks for sending a pull request!
1. If this is your first time, please read our contributor guidelines:
https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute
2. Make sure your PR title starts with JIRA issue id, e.g.,
'HADOOP-17799. Your PR title ...'.
-->
### Description of PR
JIRA: https://issues.apache.org/jira/browse/HDFS-17305
Now, there are slownode and load avoidance functions, mainly implemented in
the BlockPlacementPolicyDefault class.
1. After triggering the exclusion condition, some logs will be printed on
nn, which can be used to troubleshoot anomalies in nn by checking the logs, the
code is as follows:
```java
...
if (!node.isInService()) {
logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
return false;
}
if (avoidStaleNodes) {
if (node.isStale(this.staleInterval)) {
logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
return false;
}
}
...
```
2. If the exclusion condition is triggered, we can record it through metrics
and count the total number of exclusions.
3. These metrics through prometheus+grafana to observe the current situation
of the cluster when selecting datanodes.
### How was this patch tested?
Add TestNameNodeMetrics#testAvoidTargetDataNodeMetrics UnitTest.
> Add avoid datanode reason count related metrics to namenode.
> ------------------------------------------------------------
>
> Key: HDFS-17305
> URL: https://issues.apache.org/jira/browse/HDFS-17305
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: huangzhaobo99
> Assignee: huangzhaobo99
> Priority: Minor
>
> Now, there are slownode and load avoidance functions, mainly implemented in
> theĀ BlockPlacementPolicyDefault class.
> 1. After triggering the exclusion condition, some logs will be printed on nn,
> which can be used to troubleshoot anomalies in nn by checking the logs, the
> code is as follows:
> {code:java}
> ...
> if (!node.isInService()) {
> logNodeIsNotChosen(node, NodeNotChosenReason.NOT_IN_SERVICE);
> return false;
> }
> if (avoidStaleNodes) {
> if (node.isStale(this.staleInterval)) {
> logNodeIsNotChosen(node, NodeNotChosenReason.NODE_STALE);
> return false;
> }
> }
> ...{code}
> 2. If the exclusion condition is triggered, we can record it through metrics
> and count the total number of exclusions.
> 3. These metrics through prometheus+grafana to observe the current situation
> of the cluster when selecting datanodes.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]