CodingCat commented on PR #3569: URL: https://github.com/apache/celeborn/pull/3569#issuecomment-3699970392
> @CodingCat , I have not thought through what needs to be done to address it in Apache Spark - if there are concrete proposal, I can help review and evolve it. My suggestion would be to address it at the right layer. > > This appears to be a recurring issue in Spark, and has come up in past as well. > > > > Having said that, while I was trying to be constructive in making progress here, I have already given my comments and cant keep revisiting them - as currently formulated, I am not in favor of the (fairly nontrivial) change. > > > > If there is additional details/usecases and/or refinements which help I am happy to take a look/revisit my position. I think that's the key option conflict here, I don't really take Spark as the right layer to address this issue one of the major reasons is that it cannot be extended to an advanced version of this PR,partition level early deletion, given vanilla Spark shuffle storage format (well, you can still work it out, but that will touch every piece of shuffle related code) to have a more cost efficient solution for shuffle storage via early shuffle deletion, no matter which layer you build it on, you always need to tradeoff between happy path storage cost and bad path computing cost..and with the facilities of remote shuffle systems storage layout, you can significantly improve the happy path gainings in summary, building on Spark layer brings the same if not higher, cost, we already have a solution and can extend it to a even better one in Remote Shuffle Systems, why not -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
