turboFei commented on code in PR #3008:
URL: https://github.com/apache/celeborn/pull/3008#discussion_r1896037541


##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -4884,6 +4886,23 @@ object CelebornConf extends Logging {
       .timeConf(TimeUnit.MILLISECONDS)
       .createWithDefaultString("3s")
 
+  val RPC_TIMEOUT_RETRY_WAIT: ConfigEntry[Long] =
+    buildConf("celeborn.rpc.timeoutRetryWait")
+      .categories("network")
+      .version("0.6.0")
+      .doc("Wait time before next retry if RpcTimeoutException.")

Review Comment:
   nit: `if RpcTimeoutException` => `on RpcTimeoutException`.



##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -4884,6 +4886,23 @@ object CelebornConf extends Logging {
       .timeConf(TimeUnit.MILLISECONDS)
       .createWithDefaultString("3s")
 
+  val RPC_TIMEOUT_RETRY_WAIT: ConfigEntry[Long] =
+    buildConf("celeborn.rpc.timeoutRetryWait")
+      .categories("network")
+      .version("0.6.0")
+      .doc("Wait time before next retry if RpcTimeoutException.")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .createWithDefaultString("1s")
+
+  val CLIENT_CALL_LIFECYCLEMANAGER_MAX_RETRIES: ConfigEntry[Int] =
+    buildConf("celeborn.client.callLifecycleManager.maxRetries")
+      .withAlternative("celeborn.callLifecycleManager.maxRetries")

Review Comment:
    Seems there is no legacy config `celeborn.callLifecycleManager.maxRetries`, 
do not need `withAlternative`?



##########
client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:
##########
@@ -20,10 +20,7 @@
 import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.util.*;
-import java.util.concurrent.Callable;
-import java.util.concurrent.ConcurrentHashMap;
-import java.util.concurrent.ExecutorService;
-import java.util.concurrent.TimeUnit;
+import java.util.concurrent.*;

Review Comment:
   seems unnecessary change? I do not see new concurrent class involved in this 
class.



##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -4884,6 +4886,23 @@ object CelebornConf extends Logging {
       .timeConf(TimeUnit.MILLISECONDS)
       .createWithDefaultString("3s")
 
+  val RPC_TIMEOUT_RETRY_WAIT: ConfigEntry[Long] =
+    buildConf("celeborn.rpc.timeoutRetryWait")
+      .categories("network")
+      .version("0.6.0")
+      .doc("Wait time before next retry if RpcTimeoutException.")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .createWithDefaultString("1s")
+
+  val CLIENT_CALL_LIFECYCLEMANAGER_MAX_RETRIES: ConfigEntry[Int] =
+    buildConf("celeborn.client.callLifecycleManager.maxRetries")
+      .withAlternative("celeborn.callLifecycleManager.maxRetries")

Review Comment:
   Is it possible to reuse CLIENT_RPC_MAX_RETIRES?
   ```
     val CLIENT_RPC_MAX_RETIRES: ConfigEntry[Int] =
       buildConf("celeborn.client.rpc.maxRetries")
         .categories("client")
         .version("0.3.2")
         .doc("Max RPC retry times in LifecycleManager.")
         .intConf
         .createWithDefault(3)
   ```



##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -4884,6 +4886,23 @@ object CelebornConf extends Logging {
       .timeConf(TimeUnit.MILLISECONDS)
       .createWithDefaultString("3s")
 
+  val RPC_TIMEOUT_RETRY_WAIT: ConfigEntry[Long] =
+    buildConf("celeborn.rpc.timeoutRetryWait")

Review Comment:
   Seems `celeborn.rpc.retryWait` is enough.



##########
client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:
##########
@@ -1700,13 +1700,12 @@ private void mapEndInternal(
       throws IOException {
     final String mapKey = Utils.makeMapKey(shuffleId, mapId, attemptId);
     PushState pushState = getPushState(mapKey);
-

Review Comment:
   unnecessary change.



##########
client/src/main/java/org/apache/celeborn/client/ShuffleClientImpl.java:
##########
@@ -1700,13 +1700,12 @@ private void mapEndInternal(
       throws IOException {
     final String mapKey = Utils.makeMapKey(shuffleId, mapId, attemptId);
     PushState pushState = getPushState(mapKey);
-
     try {
       limitZeroInFlight(mapKey, pushState);
-

Review Comment:
   unnecessary change.



##########
common/src/main/scala/org/apache/celeborn/common/CelebornConf.scala:
##########
@@ -4884,6 +4886,23 @@ object CelebornConf extends Logging {
       .timeConf(TimeUnit.MILLISECONDS)
       .createWithDefaultString("3s")
 
+  val RPC_TIMEOUT_RETRY_WAIT: ConfigEntry[Long] =
+    buildConf("celeborn.rpc.timeoutRetryWait")
+      .categories("network")
+      .version("0.6.0")
+      .doc("Wait time before next retry if RpcTimeoutException.")
+      .timeConf(TimeUnit.MILLISECONDS)
+      .createWithDefaultString("1s")
+
+  val CLIENT_CALL_LIFECYCLEMANAGER_MAX_RETRIES: ConfigEntry[Int] =
+    buildConf("celeborn.client.callLifecycleManager.maxRetries")
+      .withAlternative("celeborn.callLifecycleManager.maxRetries")
+      .categories("client")
+      .version("0.6.0")
+      .doc("Max retry times for client to reserve slots.")

Review Comment:
   > to reserve slots.
   
   Seems not only for reserving slots.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to