RexXiong commented on code in PR #2629:
URL: https://github.com/apache/celeborn/pull/2629#discussion_r1697987581
##########
common/src/main/java/org/apache/celeborn/common/client/MasterClient.java:
##########
@@ -226,14 +225,23 @@ private void resetRpcEndpointRef(@Nullable RpcEndpointRef
oldRef) {
* cannot be obtained.
* @return non-empty RpcEndpointRef.
*/
- private RpcEndpointRef getOrSetupRpcEndpointRef(AtomicInteger currentIndex) {
+ private RpcEndpointRef getOrSetupRpcEndpointRef(AtomicInteger currentIndex,
int currentAttempt) {
RpcEndpointRef endpointRef = rpcEndpointRef.get();
+
+ List<String> activeMasterEndpoints =
masterEndpointResolver.getActiveMasterEndpoints();
+ // If endpoints are updated by MasterEndpointResolver, we should reset the
currentIndex to 0.
+ // This also unset the value of updated, so we don't always reset
currentIndex to 0.
+ if (masterEndpointResolver.getUpdatedAndReset()) {
+ currentIndex.set(0);
+ maxRetries = Math.max(maxRetries, currentAttempt +
activeMasterEndpoints.size());
Review Comment:
IMO, it would be better to retain maxRetries as is. If we change maxRetries
to a larger value, subsequent retries will use the increased number of
attempts, which will alter the original behavior. For example:
Time 1.maxRetries=3
Time 2.retry when currentAttempt=2, activeMasterEndpoints=3, then
maxRetries=5
Time 3.retry when currentAttempt=4, activeMasterEndpoints=3, then
maxRetries=7
Time 4.retry when currentAttempt=6, activeMasterEndpoints=3, then
maxRetries=9
....
If the resolver continues to change, the number of maxRetries might increase
significantly, leading to exceedingly high retry attempts in subsequent
iterations.
And for every endpoint updates, we should never exceed the maxRetries. I
think we can use a temp variable in while loop, and check endpoints whether
updated before `getOrSetupRpcEndpointRef`, then we can decide increase the temp
variable.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]