CodingCat commented on PR #2358:
URL: 
https://github.com/apache/incubator-celeborn/pull/2358#issuecomment-1997919907

   > Hi, @CodingCat 
   > 
   > 
   > 
   > Sorry for the late review.
   > 
   > 
   > 
   > IMO, the current approaches to "adaptive memory management" do not 
effectively achieve the intended adaptability. Specifically, in cases where the 
shuffle data is of significant size, the maximum push threshold will continue 
to increase until it reaches `numPartitions * sendBufferSizeInBytes` 
(`executorMemory * 0.4` after 
[PR-2388](https://github.com/apache/incubator-celeborn/pull/2388)). This 
increases the risk of encountering OOM errors, as there is currently no 
mechanism in place to lower the maximum push threshold.
   > 
   > 
   > 
   > I think we need to refactor this feature. cc @waitinfuture 
   > 
   > 
   > 
   > 
   
   the original purpose of the adaptiveness here is to resolve the problem that 
small partition data triggered too many pushes therefore high cost of 
compression, etc., so i didn't really aim to reduce the buffer size
   
   of course , I agree it increases the OOM risk since we increase the size 
only, but i would also argue that if we keep increasing/decreasing buffer it 
could bring the unstable write performance....I think it is just a trade off 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to