cfmcgrady commented on PR #2358: URL: https://github.com/apache/incubator-celeborn/pull/2358#issuecomment-1997903706
Hi, @CodingCat Sorry for the late review. IMO, the current approaches to "adaptive memory management" do not effectively achieve the intended adaptability. Specifically, in cases where the shuffle data is of significant size, the maximum push threshold will continue to increase until it reaches `numPartitions * sendBufferSizeInBytes` (`executorMemory * 0.4` after [PR-2388](https://github.com/apache/incubator-celeborn/pull/2388)). This increases the risk of encountering OOM errors, as there is currently no mechanism in place to lower the maximum push threshold. I think we need to refactor this feature. cc @waitinfuture -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
