mridulm commented on PR #3569: URL: https://github.com/apache/celeborn/pull/3569#issuecomment-3698598006
@CodingCat Assuming my understanding of the proposal is correct (which appears to be so based on the response) - for specific jobs, there are application level updates which can be made to accamadate the requirements - without needing to update the platform. This is not specific to Apache Celeborn btw, but applicable to any spark application which is generating persisted data and/or shuffle output. An example approach could be to - ensure references are released, and call periodic gc explicitly (spark does it already, but users can can force it as required) : after an expensive query is done, etc. This might not be optimal for all cases unfortunately - but addressing the general case would require updates to Apache Spark - not work-around it through Apache Celeborn. Even in the specific usecases being internally observed for your ecosystem - If/when query/usage patterns change, it will no longer work well. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
