mridulm opened a new pull request, #2126:
URL: https://github.com/apache/incubator-celeborn/pull/2126

   <!--
   Thanks for sending a pull request!  Here are some tips for you:
     - Make sure the PR title start w/ a JIRA ticket, e.g. '[CELEBORN-XXXX] 
Your PR title ...'.
     - Be sure to keep the PR description updated to reflect all changes.
     - Please write your PR title to summarize what this PR proposes.
     - If possible, provide a concise example to reproduce the issue for a 
faster review.
   -->
   
   ### What changes were proposed in this pull request?
   
   As detailed in the jira, this improves replica selection when rack awareness 
is enabled.
   There are primarily two changes:
   
   * Change the distribution of workers in the input workers list so that there 
is maximum rack diversity between elements in the list.
   * Change round robin implementation to independently select replica - 
instead of based on primary.
   
   
   ### Why are the changes needed?
   
   In our productionization testing environment, we saw disproportionate amount 
of traffic to a small set of nodes when rack awareness was enabled - which 
resulted in bringing down the overall throughput oif the entire deployment - as 
all celeborn traffic gets bottlenecked on a small set of nodes.
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   No
   
   
   ### How was this patch tested?
   
   This was extensively tested in our environment and validated to improve the 
overall performance. Note, this is againts a modified version of 0.3, but the 
specific changes touch parts of master which have not been updated since.
   In addition unit tests pass as well.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to