mridulm opened a new pull request, #2446: URL: https://github.com/apache/celeborn/pull/2446
### What changes were proposed in this pull request? `MasterNotLeaderException` should include not just the rpc address for applications, but also the internal rpc address (based on internal port) for workers to connect to the leader. I am leveraging `adminAddress` in order to persist this information in ratis state - and have client reconnect to the right rpc endpoint based on whether it is application or worker. ### Why are the changes needed? Without this fix, workers connect to rpc endpoint which applications use (and not at internal port) when `MasterNotLeaderException` is thrown - resulting in failures. ### Does this PR introduce _any_ user-facing change? No, bug fix. ### How was this patch tested? Tested with a local deployment - 3 masters + 2 workers, with authentication and internal port enabled. Without the fix, one or more worker fails to come up and fails with exception. Additionally, existing tests pass and new test added. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
