CodingCat commented on PR #2358: URL: https://github.com/apache/incubator-celeborn/pull/2358#issuecomment-1997976606
> > Hi, @CodingCat > > Sorry for the late review. > > IMO, the current approaches to "adaptive memory management" do not effectively achieve the intended adaptability. Specifically, in cases where the shuffle data is of significant size, the maximum push threshold will continue to increase until it reaches `numPartitions * sendBufferSizeInBytes` (`executorMemory * 0.4` after [PR-2388](https://github.com/apache/incubator-celeborn/pull/2388)). This increases the risk of encountering OOM errors, as there is currently no mechanism in place to lower the maximum push threshold. > > I think we need to refactor this feature. cc @waitinfuture > > the original purpose of the adaptiveness here is to resolve the problem that small partition data triggered too many pushes therefore high cost of compression, etc., so i didn't really aim to reduce the buffer size > > of course , I agree it increases the OOM risk since we increase the size only, but i would also argue that if we keep increasing/decreasing buffer it could bring the unstable write performance....I think it is just a trade off this feature is currently optional, if you don't have small partition data problem like us, you can still use a static threshold like 64MB...but once you have this problem, you may face the risk of suboptimal tuning of parameters, you may tune it to 256MB unnecessarily whereas maybe 128MB has been sufficient....or you tune it for multiple times 128/256 and still face such perf issues ...that's where this feature comes in for such a headache something we could consider is to adopt AIMD (additive increase multiplicative decrease) in tcp , but that deserves a solid testing as it brings way more complexities than what it currently looks like, I can give it a shot -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
