mridulm commented on PR #3569: URL: https://github.com/apache/celeborn/pull/3569#issuecomment-3699790866
@CodingCat while I am sympathetic to the intent behind the change, this is not the right way to address it. While Apache Spark has reasonably robust ability to recompute lost data - that is primarily to address fault tolerance; which is getting misused here. The rationale, used in this PR, applies to vanilla shuffle in Spark as well; and the analysis would be the same - it is unsound and violates how Spark currently expects shuffle to behave : which is why Spark relies on GC to clean up shuffle. Diverging nontrivially from Spark, in Apache Celeborn, will cause maintenance issues and ability to evolve the projects. Having said that, I understand the pain point - I am open to proposals to evolve this in Apache Spark (which can then be leveraged in Celeborn). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
