Re: [PR] [CELEBORN-2244] shuffle early delete feature for Spark [celeborn]

via GitHub Tue, 30 Dec 2025 00:02:48 -0800


mridulm commented on PR #3569:
URL: https://github.com/apache/celeborn/pull/3569#issuecomment-3698598006


   @CodingCat Assuming my understanding of the proposal is correct (which 
appears to be so based on the response) - for specific jobs, there are 
application level updates which can be made to accamadate the requirements - 
without needing to update the platform. This is not specific to Apache Celeborn 
btw, but applicable to any spark application which is generating persisted data 
and/or shuffle output.
   
   An example approach could be to - ensure references are released, and call 
periodic gc explicitly (spark does it already, but users can can force it as 
required) : after an expensive query is done, etc. This might not be optimal 
for all cases unfortunately - but addressing the general case would require 
updates to Apache Spark - not work-around it through Apache Celeborn.
   
   Even in the specific usecases being internally observed for your ecosystem - 
If/when query/usage patterns change, it will no longer work well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-2244] shuffle early delete feature for Spark [celeborn]

Reply via email to