[ 
https://issues.apache.org/jira/browse/HADOOP-18920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17775681#comment-17775681
 ] 

ASF GitHub Bot commented on HADOOP-18920:
-----------------------------------------

ZanderXu commented on code in PR #6146:
URL: https://github.com/apache/hadoop/pull/6146#discussion_r1360457921


##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java:
##########
@@ -504,6 +504,10 @@ public class CommonConfigurationKeysPublic {
                                                 "ipc.server.log.slow.rpc";
   public static final boolean IPC_SERVER_LOG_SLOW_RPC_DEFAULT = false;
 
+  public static final String IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_KEY =
+      "ipc.server.log.slow.rpc-threshold-ms";
+  public static final long IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_DEFAULT = 
10000L;

Review Comment:
   This default value should be 0, so that the slow log logic is same as before.



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpc.java:
##########
@@ -353,10 +353,11 @@ public void testExtraLongRpc() throws Exception {
   @Test(timeout = 12000)
   public void testLogSlowRPC() throws IOException, ServiceException,
       TimeoutException, InterruptedException {
-    //No test with legacy
+    // No test with legacy.

Review Comment:
   this change is unnecessary.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/CommonConfigurationKeysPublic.java:
##########
@@ -504,6 +504,10 @@ public class CommonConfigurationKeysPublic {
                                                 "ipc.server.log.slow.rpc";
   public static final boolean IPC_SERVER_LOG_SLOW_RPC_DEFAULT = false;
 
+  public static final String IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_KEY =
+      "ipc.server.log.slow.rpc-threshold-ms";
+  public static final long IPC_SERVER_LOG_SLOW_RPC_THRESHOLD_MS_DEFAULT = 
10000L;

Review Comment:
   We should add this configure into core-default.xml



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpc.java:
##########
@@ -365,12 +366,18 @@ public void testLogSlowRPC() throws IOException, 
ServiceException,
       }
     }
 
-    // Ensure RPC metrics are updated
+    // Ensure RPC metrics are updated.

Review Comment:
   here is too.



##########
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/ipc/TestProtoBufRpc.java:
##########
@@ -353,10 +353,11 @@ public void testExtraLongRpc() throws Exception {
   @Test(timeout = 12000)
   public void testLogSlowRPC() throws IOException, ServiceException,
       TimeoutException, InterruptedException {
-    //No test with legacy
+    // No test with legacy.
     assumeFalse(testWithLegacy);
+    server.setLogSlowRPCThresholdMs(SLEEP_DURATION);
     TestRpcService2 client = getClient2();
-    // make 10 K fast calls
+    // Make 10 K fast calls.

Review Comment:
   here is too.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java:
##########
@@ -587,15 +601,15 @@ void logSlowRpcCalls(String methodName, Call call,
     final double threeSigma = rpcMetrics.getProcessingMean() +
         (rpcMetrics.getProcessingStdDev() * deviation);
 
-    long processingTime =
-            details.get(Timing.PROCESSING, rpcMetrics.getMetricsTimeUnit());
+    final TimeUnit metricsTimeUnit = rpcMetrics.getMetricsTimeUnit();
+    long processingTime = details.get(Timing.PROCESSING, metricsTimeUnit);
+    long logSlowRPCThresholdTime =
+        metricsTimeUnit.convert(getLogSlowRPCThresholdMs(), 
TimeUnit.MILLISECONDS);
     if ((rpcMetrics.getProcessingSampleCount() > minSampleSize) &&
-        (processingTime > threeSigma)) {
-      LOG.warn(
-          "Slow RPC : {} took {} {} to process from client {},"
-              + " the processing detail is {}",
-          methodName, processingTime, rpcMetrics.getMetricsTimeUnit(), call,
-          details.toString());
+        (processingTime > threeSigma) &&
+        (processingTime > logSlowRPCThresholdTime)) {
+      LOG.warn("Slow RPC : {} took {} {} to process from client {}, the 
processing detail is {}",
+          methodName, processingTime, metricsTimeUnit, call, 
details.toString());

Review Comment:
   Can print the slow threshold.



##########
hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/ipc/Server.java:
##########
@@ -587,15 +601,15 @@ void logSlowRpcCalls(String methodName, Call call,
     final double threeSigma = rpcMetrics.getProcessingMean() +
         (rpcMetrics.getProcessingStdDev() * deviation);
 
-    long processingTime =
-            details.get(Timing.PROCESSING, rpcMetrics.getMetricsTimeUnit());
+    final TimeUnit metricsTimeUnit = rpcMetrics.getMetricsTimeUnit();
+    long processingTime = details.get(Timing.PROCESSING, metricsTimeUnit);
+    long logSlowRPCThresholdTime =
+        metricsTimeUnit.convert(getLogSlowRPCThresholdMs(), 
TimeUnit.MILLISECONDS);

Review Comment:
   There is no need to covert this threshold every time. We can convert this 
threshold when it is initialized by this timeunit, because this timeunit will 
not be changed.





> RPC Metrics : Optimize logic for log slow RPCs
> ----------------------------------------------
>
>                 Key: HADOOP-18920
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18920
>             Project: Hadoop Common
>          Issue Type: Improvement
>            Reporter: Haiyang Hu
>            Assignee: Haiyang Hu
>            Priority: Major
>              Labels: pull-request-available
>
> HADOOP-12325 implement a capability where "slow" RPCs are logged in NN log.
> Current processing logic is the "slow" RPCs are to be those whose processing 
> time is outside 3 standard deviation.
> However, in practice it is found that many logs of slow rpc are currently 
> output, and sometimes RPCs with a processing time of 1ms are also declared as 
> slow, this is not in line with actual expectations.
> Therefore, consider optimize the logic conditions of slow RPC and add a 
> `logSlowRPCThresholdMs` variable to judge whether the current RPCas slow so 
> that the expected slow RPC log can be logger.
> for `logSlowRPCThresholdMs`, we can support dynamic refresh to facilitate 
> adjustments based on the actual operating conditions of the hdfs cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to