zuston commented on code in PR #1738:
URL:
https://github.com/apache/incubator-uniffle/pull/1738#discussion_r1626987691
##########
client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java:
##########
@@ -191,8 +191,20 @@ public class RssSparkConfig {
createIntegerBuilder(new
ConfigBuilder("spark.rss.client.unregister.thread.pool.size"))
.createWithDefault(10);
+ public static final ConfigEntry<Integer> RSS_CLIENT_UNREGISTER_TIMEOUT_SEC =
+ createIntegerBuilder(
+ new ConfigBuilder("spark.rss.client.unregister.timeout.sec")
+ .doc(
+ "Unregister requests are executed concurrently and all
requests together "
+ + "have to complete within this timeout."))
+ .createWithDefault(60);
Review Comment:
Please use the `ConfigOption` to init this config. The `configEntry` is not
a recommended way
##########
client-spark/common/src/main/java/org/apache/spark/shuffle/RssSparkConfig.java:
##########
@@ -191,8 +191,20 @@ public class RssSparkConfig {
createIntegerBuilder(new
ConfigBuilder("spark.rss.client.unregister.thread.pool.size"))
.createWithDefault(10);
+ public static final ConfigEntry<Integer> RSS_CLIENT_UNREGISTER_TIMEOUT_SEC =
+ createIntegerBuilder(
+ new ConfigBuilder("spark.rss.client.unregister.timeout.sec")
+ .doc(
+ "Unregister requests are executed concurrently and all
requests together "
+ + "have to complete within this timeout."))
+ .createWithDefault(60);
Review Comment:
I think we'd better to reserve the original 10s timeout, if this is too
short, let's optimize after having some user cases. WDYT?
##########
client/src/main/java/org/apache/uniffle/client/impl/ShuffleWriteClientImpl.java:
##########
@@ -984,11 +995,25 @@ public void unregisterShuffle(String appId, int
shuffleId) {
ShuffleServerClientFactory.getInstance()
.getShuffleServerClient(clientType, shuffleServerInfo,
rssConf);
RssUnregisterShuffleResponse response =
client.unregisterShuffle(request);
- if (response.getStatusCode() != StatusCode.SUCCESS) {
- LOG.warn("Failed to unregister shuffle to " +
shuffleServerInfo);
+ if (response.getStatusCode() == StatusCode.SUCCESS) {
+ LOG.info("Successfully unregistered shuffle from {}",
shuffleServerInfo);
+ } else {
+ LOG.warn("Failed to unregister shuffle from {}",
shuffleServerInfo);
}
} catch (Exception e) {
- LOG.warn("Error happened when unregistering to " +
shuffleServerInfo, e);
+ if (e.getCause() != null && e.getCause() instanceof
InterruptedException) {
+ // InterruptedException indicates timeout, give some advice
Review Comment:
Why the `InterruptedException` indicates the timeout? I think you could
directly use the `CompletableFutureExtension` that has been created in uniffle
common module.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]