zuston commented on code in PR #1738:
URL: 
https://github.com/apache/incubator-uniffle/pull/1738#discussion_r1626987691


##########
client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java:
##########
@@ -191,8 +191,20 @@ public class RssSparkConfig {
       createIntegerBuilder(new 
ConfigBuilder("spark.rss.client.unregister.thread.pool.size"))
           .createWithDefault(10);
 
+  public static final ConfigEntry<Integer> RSS_CLIENT_UNREGISTER_TIMEOUT_SEC =
+      createIntegerBuilder(
+              new ConfigBuilder("spark.rss.client.unregister.timeout.sec")
+                  .doc(
+                      "Unregister requests are executed concurrently and all 
requests together "
+                          + "have to complete within this timeout."))
+          .createWithDefault(60);

Review Comment:
   Please use the `ConfigOption` to init this config. The `configEntry` is not 
a recommended way



##########
client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java:
##########
@@ -191,8 +191,20 @@ public class RssSparkConfig {
       createIntegerBuilder(new 
ConfigBuilder("spark.rss.client.unregister.thread.pool.size"))
           .createWithDefault(10);
 
+  public static final ConfigEntry<Integer> RSS_CLIENT_UNREGISTER_TIMEOUT_SEC =
+      createIntegerBuilder(
+              new ConfigBuilder("spark.rss.client.unregister.timeout.sec")
+                  .doc(
+                      "Unregister requests are executed concurrently and all 
requests together "
+                          + "have to complete within this timeout."))
+          .createWithDefault(60);

Review Comment:
   I think we'd better to reserve the original 10s timeout, if this is too 
short, let's optimize after having some user cases. WDYT? 



##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -984,11 +995,25 @@ public void unregisterShuffle(String appId, int 
shuffleId) {
                   ShuffleServerClientFactory.getInstance()
                       .getShuffleServerClient(clientType, shuffleServerInfo, 
rssConf);
               RssUnregisterShuffleResponse response = 
client.unregisterShuffle(request);
-              if (response.getStatusCode() != StatusCode.SUCCESS) {
-                LOG.warn("Failed to unregister shuffle to " + 
shuffleServerInfo);
+              if (response.getStatusCode() == StatusCode.SUCCESS) {
+                LOG.info("Successfully unregistered shuffle from {}", 
shuffleServerInfo);
+              } else {
+                LOG.warn("Failed to unregister shuffle from {}", 
shuffleServerInfo);
               }
             } catch (Exception e) {
-              LOG.warn("Error happened when unregistering to " + 
shuffleServerInfo, e);
+              if (e.getCause() != null && e.getCause() instanceof 
InterruptedException) {
+                // InterruptedException indicates timeout, give some advice

Review Comment:
   Why the `InterruptedException` indicates the timeout? I think you could 
directly use the `CompletableFutureExtension` that has been created in uniffle 
common module.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to