CodingCat commented on PR #2358:
URL: 
https://github.com/apache/incubator-celeborn/pull/2358#issuecomment-1997976606

   > > Hi, @CodingCat
   > > Sorry for the late review.
   > > IMO, the current approaches to "adaptive memory management" do not 
effectively achieve the intended adaptability. Specifically, in cases where the 
shuffle data is of significant size, the maximum push threshold will continue 
to increase until it reaches `numPartitions * sendBufferSizeInBytes` 
(`executorMemory * 0.4` after 
[PR-2388](https://github.com/apache/incubator-celeborn/pull/2388)). This 
increases the risk of encountering OOM errors, as there is currently no 
mechanism in place to lower the maximum push threshold.
   > > I think we need to refactor this feature. cc @waitinfuture
   > 
   > the original purpose of the adaptiveness here is to resolve the problem 
that small partition data triggered too many pushes therefore high cost of 
compression, etc., so i didn't really aim to reduce the buffer size
   > 
   > of course , I agree it increases the OOM risk since we increase the size 
only, but i would also argue that if we keep increasing/decreasing buffer it 
could bring the unstable write performance....I think it is just a trade off
   
   this feature is currently optional, if you don't have small partition data 
problem like us, you can still use a static threshold like 64MB...but once you 
have this problem, you may face the risk of suboptimal tuning of parameters, 
you may tune it to 256MB unnecessarily whereas maybe 128MB has been 
sufficient....or you tune it for multiple times 128/256 and still face such 
perf issues ...that's where this feature comes in for such a headache
   
   something we could consider is to adopt AIMD (additive increase 
multiplicative decrease) in tcp , but that deserves a solid testing as it 
brings way more complexities than what it currently looks like, I can give it a 
shot


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to