[
https://issues.apache.org/jira/browse/HDFS-16678?focusedWorklogId=793863&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-793863
]
ASF GitHub Bot logged work on HDFS-16678:
-----------------------------------------
Author: ASF GitHub Bot
Created on: 21/Jul/22 17:31
Start Date: 21/Jul/22 17:31
Worklog Time Spent: 10m
Work Description: goiri commented on code in PR #4606:
URL: https://github.com/apache/hadoop/pull/4606#discussion_r926936268
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
final Map<String, Map<String, Object>> info = new HashMap<>();
try {
- RouterRpcServer rpcServer = this.router.getRpcServer();
- DatanodeInfo[] live = rpcServer.getDatanodeReport(
- DatanodeReportType.LIVE, false, timeOut);
-
- if (live.length > 0) {
- float totalDfsUsed = 0;
- float[] usages = new float[live.length];
- int i = 0;
- for (DatanodeInfo dn : live) {
- usages[i++] = dn.getDfsUsedPercent();
- totalDfsUsed += dn.getDfsUsedPercent();
- }
- totalDfsUsed /= live.length;
- Arrays.sort(usages);
- median = usages[usages.length / 2];
- max = usages[usages.length - 1];
- min = usages[0];
-
- for (i = 0; i < usages.length; i++) {
- dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+ if (this.enableGetDNUsage) {
Review Comment:
I would do:
```
DatanodeInfo[] live = null;
if (this.enableGetDNUsage) {
RouterRpcServer rpcServer = this.router.getRpcServer();
DatanodeInfo[] live = rpcServer.getDatanodeReport(DatanodeReportType.LIVE,
false, timeOut);
} else {
LOG.debug("Getting information is disabled."); // similar message
}
if (live != null && live.length > 0) {
```
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
final Map<String, Map<String, Object>> info = new HashMap<>();
try {
- RouterRpcServer rpcServer = this.router.getRpcServer();
- DatanodeInfo[] live = rpcServer.getDatanodeReport(
- DatanodeReportType.LIVE, false, timeOut);
-
- if (live.length > 0) {
- float totalDfsUsed = 0;
- float[] usages = new float[live.length];
- int i = 0;
- for (DatanodeInfo dn : live) {
- usages[i++] = dn.getDfsUsedPercent();
- totalDfsUsed += dn.getDfsUsedPercent();
- }
- totalDfsUsed /= live.length;
- Arrays.sort(usages);
- median = usages[usages.length / 2];
- max = usages[usages.length - 1];
- min = usages[0];
-
- for (i = 0; i < usages.length; i++) {
- dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+ if (this.enableGetDNUsage) {
+ RouterRpcServer rpcServer = this.router.getRpcServer();
+ DatanodeInfo[] live = rpcServer.getDatanodeReport(
+ DatanodeReportType.LIVE, false, timeOut);
+
+ if (live.length > 0) {
+ float totalDfsUsed = 0;
+ float[] usages = new float[live.length];
+ int i = 0;
+ for (DatanodeInfo dn : live) {
+ usages[i++] = dn.getDfsUsedPercent();
Review Comment:
What is the expensive part of this whole block? this?
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
final Map<String, Map<String, Object>> info = new HashMap<>();
try {
- RouterRpcServer rpcServer = this.router.getRpcServer();
- DatanodeInfo[] live = rpcServer.getDatanodeReport(
- DatanodeReportType.LIVE, false, timeOut);
-
- if (live.length > 0) {
- float totalDfsUsed = 0;
- float[] usages = new float[live.length];
- int i = 0;
- for (DatanodeInfo dn : live) {
- usages[i++] = dn.getDfsUsedPercent();
- totalDfsUsed += dn.getDfsUsedPercent();
- }
- totalDfsUsed /= live.length;
- Arrays.sort(usages);
- median = usages[usages.length / 2];
- max = usages[usages.length - 1];
- min = usages[0];
-
- for (i = 0; i < usages.length; i++) {
- dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+ if (this.enableGetDNUsage) {
+ RouterRpcServer rpcServer = this.router.getRpcServer();
+ DatanodeInfo[] live = rpcServer.getDatanodeReport(
+ DatanodeReportType.LIVE, false, timeOut);
+
+ if (live.length > 0) {
+ float totalDfsUsed = 0;
+ float[] usages = new float[live.length];
+ int i = 0;
+ for (DatanodeInfo dn : live) {
+ usages[i++] = dn.getDfsUsedPercent();
+ totalDfsUsed += dn.getDfsUsedPercent();
+ }
+ totalDfsUsed /= live.length;
+ Arrays.sort(usages);
+ median = usages[usages.length / 2];
+ max = usages[usages.length - 1];
+ min = usages[0];
+
+ for (i = 0; i < usages.length; i++) {
+ dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+ }
+ dev = (float) Math.sqrt(dev / usages.length);
Review Comment:
Apache commons math has a nice StandardDeviation utility.
It might be good to use this directly.
https://commons.apache.org/proper/commons-math/javadocs/api-3.6/org/apache/commons/math3/stat/descriptive/moment/StandardDeviation.html
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/metrics/RBFMetrics.java:
##########
@@ -544,28 +548,30 @@ public String getNodeUsage() {
final Map<String, Map<String, Object>> info = new HashMap<>();
try {
- RouterRpcServer rpcServer = this.router.getRpcServer();
- DatanodeInfo[] live = rpcServer.getDatanodeReport(
- DatanodeReportType.LIVE, false, timeOut);
-
- if (live.length > 0) {
- float totalDfsUsed = 0;
- float[] usages = new float[live.length];
- int i = 0;
- for (DatanodeInfo dn : live) {
- usages[i++] = dn.getDfsUsedPercent();
- totalDfsUsed += dn.getDfsUsedPercent();
- }
- totalDfsUsed /= live.length;
- Arrays.sort(usages);
- median = usages[usages.length / 2];
- max = usages[usages.length - 1];
- min = usages[0];
-
- for (i = 0; i < usages.length; i++) {
- dev += (usages[i] - totalDfsUsed) * (usages[i] - totalDfsUsed);
+ if (this.enableGetDNUsage) {
+ RouterRpcServer rpcServer = this.router.getRpcServer();
+ DatanodeInfo[] live = rpcServer.getDatanodeReport(
+ DatanodeReportType.LIVE, false, timeOut);
+
+ if (live.length > 0) {
+ float totalDfsUsed = 0;
+ float[] usages = new float[live.length];
+ int i = 0;
+ for (DatanodeInfo dn : live) {
+ usages[i++] = dn.getDfsUsedPercent();
Review Comment:
I guess is rpcServer.getDatanodeReport() actually.
##########
hadoop-hdfs-project/hadoop-hdfs-rbf/src/main/java/org/apache/hadoop/hdfs/server/federation/router/RBFConfigKeys.java:
##########
@@ -315,6 +315,9 @@ public class RBFConfigKeys extends
CommonConfigurationKeysPublic {
FEDERATION_ROUTER_PREFIX + "dn-report.cache-expire";
public static final long DN_REPORT_CACHE_EXPIRE_MS_DEFAULT =
TimeUnit.SECONDS.toMillis(10);
+ public static final String DFS_ROUTER_ENABLE_GET_DN_USAGE_KEY =
+ FEDERATION_ROUTER_PREFIX + "enable.get.dn.usage";
+ public static final boolean DFS_ROUTER_ENABLE_GET_DN_USAGE_DEFAULT = true;
Review Comment:
Can we add a test?
Issue Time Tracking
-------------------
Worklog Id: (was: 793863)
Time Spent: 0.5h (was: 20m)
> RBF supports disable getNodeUsage() in RBFMetrics
> -------------------------------------------------
>
> Key: HDFS-16678
> URL: https://issues.apache.org/jira/browse/HDFS-16678
> Project: Hadoop HDFS
> Issue Type: Improvement
> Reporter: ZanderXu
> Assignee: ZanderXu
> Priority: Major
> Labels: pull-request-available
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> In our prod environment, we try to collect RBF metrics every 15s through
> jmx_exporter. And we found that collection task often failed.
> After tracing and found that the collection task is blocked at getNodeUsage()
> in RBFMetrics, because it will collection all datanode's usage from
> downstream nameservices. This is a very expensive and almost useless
> operation. Because in most scenarios, each NameSerivce contains almost the
> same DNs. We can get the data usage's from any one nameservices, not from RBF.
> So I feel that RBF should supports disable getNodeUsage() in RBFMetrics.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]