mridulm opened a new pull request, #2446:
URL: https://github.com/apache/celeborn/pull/2446

   ### What changes were proposed in this pull request?
   
   `MasterNotLeaderException` should include not just the rpc address for 
applications, but also the internal rpc address (based on internal port) for 
workers to connect to the leader.
   
   I am leveraging `adminAddress` in order to persist this information in ratis 
state - and have client reconnect to the right rpc endpoint based on whether it 
is application or worker.
   
   ### Why are the changes needed?
   
   Without this fix, workers connect to rpc endpoint which applications use 
(and not at internal port) when `MasterNotLeaderException` is thrown - 
resulting in failures.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No, bug fix.
   
   ### How was this patch tested?
   
   Tested with a local deployment - 3 masters + 2 workers, with authentication 
and internal port enabled.
   Without the fix, one or more worker fails to come up and fails with 
exception.
   
   Additionally, existing tests pass and new test added.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to