[
https://issues.apache.org/jira/browse/HADOOP-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775681#comment-17775681
]
ASF GitHub Bot commented on HADOOP-18920:
-----------------------------------------
ZanderXu commented on code in PR #6146:
URL: https://github.com/apache/hadoop/pull/6146#discussion_r1360457921
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java:
##########
@@ -504,6 +504,10 @@ public class CommonConfigurationKeysPublic {
"ipc.server.log.slow.rpc";
public static final boolean IPC_SERVER_LOG_SLOW_RPC_DEFAULT = false;
+ public static final String IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_KEY =
+ "ipc.server.log.slow.rpc-threshold-ms";
+ public static final long IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_DEFAULT =
10000L;
Review Comment:
This default value should be 0, so that the slow log logic is same as before.
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpc.java:
##########
@@ -353,10 +353,11 @@ public void testExtraLongRpc() throws Exception {
@Test(timeout = 12000)
public void testLogSlowRPC() throws IOException, ServiceException,
TimeoutException, InterruptedException {
- //No test with legacy
+ // No test with legacy.
Review Comment:
this change is unnecessary.
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java:
##########
@@ -504,6 +504,10 @@ public class CommonConfigurationKeysPublic {
"ipc.server.log.slow.rpc";
public static final boolean IPC_SERVER_LOG_SLOW_RPC_DEFAULT = false;
+ public static final String IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_KEY =
+ "ipc.server.log.slow.rpc-threshold-ms";
+ public static final long IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_DEFAULT =
10000L;
Review Comment:
We should add this configure into core-default.xml
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpc.java:
##########
@@ -365,12 +366,18 @@ public void testLogSlowRPC() throws IOException,
ServiceException,
}
}
- // Ensure RPC metrics are updated
+ // Ensure RPC metrics are updated.
Review Comment:
here is too.
##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpc.java:
##########
@@ -353,10 +353,11 @@ public void testExtraLongRpc() throws Exception {
@Test(timeout = 12000)
public void testLogSlowRPC() throws IOException, ServiceException,
TimeoutException, InterruptedException {
- //No test with legacy
+ // No test with legacy.
assumeFalse(testWithLegacy);
+ server.setLogSlowRPCThresholdMs(SLEEP_DURATION);
TestRpcService2 client = getClient2();
- // make 10 K fast calls
+ // Make 10 K fast calls.
Review Comment:
here is too.
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java:
##########
@@ -587,15 +601,15 @@ void logSlowRpcCalls(String methodName, Call call,
final double threeSigma = rpcMetrics.getProcessingMean() +
(rpcMetrics.getProcessingStdDev() * deviation);
- long processingTime =
- details.get(Timing.PROCESSING, rpcMetrics.getMetricsTimeUnit());
+ final TimeUnit metricsTimeUnit = rpcMetrics.getMetricsTimeUnit();
+ long processingTime = details.get(Timing.PROCESSING, metricsTimeUnit);
+ long logSlowRPCThresholdTime =
+ metricsTimeUnit.convert(getLogSlowRPCThresholdMs(),
TimeUnit.MILLISECONDS);
if ((rpcMetrics.getProcessingSampleCount() > minSampleSize) &&
- (processingTime > threeSigma)) {
- LOG.warn(
- "Slow RPC : {} took {} {} to process from client {},"
- + " the processing detail is {}",
- methodName, processingTime, rpcMetrics.getMetricsTimeUnit(), call,
- details.toString());
+ (processingTime > threeSigma) &&
+ (processingTime > logSlowRPCThresholdTime)) {
+ LOG.warn("Slow RPC : {} took {} {} to process from client {}, the
processing detail is {}",
+ methodName, processingTime, metricsTimeUnit, call,
details.toString());
Review Comment:
Can print the slow threshold.
##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java:
##########
@@ -587,15 +601,15 @@ void logSlowRpcCalls(String methodName, Call call,
final double threeSigma = rpcMetrics.getProcessingMean() +
(rpcMetrics.getProcessingStdDev() * deviation);
- long processingTime =
- details.get(Timing.PROCESSING, rpcMetrics.getMetricsTimeUnit());
+ final TimeUnit metricsTimeUnit = rpcMetrics.getMetricsTimeUnit();
+ long processingTime = details.get(Timing.PROCESSING, metricsTimeUnit);
+ long logSlowRPCThresholdTime =
+ metricsTimeUnit.convert(getLogSlowRPCThresholdMs(),
TimeUnit.MILLISECONDS);
Review Comment:
There is no need to covert this threshold every time. We can convert this
threshold when it is initialized by this timeunit, because this timeunit will
not be changed.
> RPC Metrics : Optimize logic for log slow RPCs
> ----------------------------------------------
>
> Key: HADOOP-18920
> URL: https://issues.apache.org/jira/browse/HADOOP-18920
> Project: Hadoop Common
> Issue Type: Improvement
> Reporter: Haiyang Hu
> Assignee: Haiyang Hu
> Priority: Major
> Labels: pull-request-available
>
> HADOOP-12325 implement a capability where "slow" RPCs are logged in NN log.
> Current processing logic is the "slow" RPCs are to be those whose processing
> time is outside 3 standard deviation.
> However, in practice it is found that many logs of slow rpc are currently
> output, and sometimes RPCs with a processing time of 1ms are also declared as
> slow, this is not in line with actual expectations.
> Therefore, consider optimize the logic conditions of slow RPC and add a
> `logSlowRPCThresholdMs` variable to judge whether the current RPCas slow so
> that the expected slow RPC log can be logger.
> for `logSlowRPCThresholdMs`, we can support dynamic refresh to facilitate
> adjustments based on the actual operating conditions of the hdfs cluster.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]