jerqi commented on PR #1652:
URL: 
https://github.com/apache/incubator-uniffle/pull/1652#issuecomment-2081788299

   From overview, I have still two points which I wan to talk about.
   1. Fault tolerance and rebalance is different concepts. We should differ 
them in the code. Although we can reuse some underlying techniques, we still 
need to avoid using naming fault tolerance or rebalance in any underlying data 
structure.
   For example, if one shuffle server has too high load, it can trigger the 
rebalance, but we can't say it's a faulty server.
   2. Task partition level reassignment record may have some risks. Baidu or 
Ali company's shuffle don't use the similar design.  We should think more about 
this point.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to