Re: [PR] [CELEBORN-2244] shuffle early delete feature for Spark [celeborn]

via GitHub Tue, 30 Dec 2025 03:12:36 -0800


CodingCat commented on PR #3569:
URL: https://github.com/apache/celeborn/pull/3569#issuecomment-3699042600


   > @CodingCat Thanks for the details, I understand the motivation and 
rationale behind the proposal - and there are existing alternatives like 
checkpoint'ing, temp materialization, etc. This will be more on the application 
design/impl side though - and apps will have to do their own tradeoffs (given 
there are costs involved).
   > 
   > 
   > 
   > As I mentioned earlier, this proposal itself is inherently unsound, and 
given additional details provided, I am not in favor of introducing it into 
Apache Celeborn - my rationale/analysis would be the same if a variant of the 
proposal was made to Apache Spark for "vanilla" shuffle as well :-)
   > 
   > 
   > 
   > I am open to being corrected ofcourse if there are other valid usecases 
and/or requirements I am missing !
   
   @mridulm Hi, I agree we can reduce shuffle cost with something like Spark 
checkpoint , or dump intermediate data to s3 manually. these kinds of 
approaches essentially sacrifice performance significantly...e.g. using s3 to 
store the intermediate results of a k-ways join can slow down queries for 
almost 10X based on my experience 
   
   
   the proposal here is essentially another alternative provided to the user: 
if your job has a "clean" lineage, you can reduce your shuffle cost by enabling 
this feature at the higher recovery cost from failure. As I said, this is not a 
feature for broad rollout but only for certain types of jobs. (actually, based 
on my experience, most of jobs will survive with this feature , since in 
reality, there are not many jobs dumping same RDD to multiple locations and 
RDD-reuse jobs can always start quickly to fill the lineage information needed 
by this feature. On the other side, only big shuffle jobs can deliver 
significant values with this feature since small shuffle jobs do not play big 
roles for your cluster capacity )
   
   
   I’d like to better understand what specific invariant you believe this 
proposal violates when you call it “inherently unsound”.
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-2244] shuffle early delete feature for Spark [celeborn]

Reply via email to