CodingCat commented on PR #3569:
URL: https://github.com/apache/celeborn/pull/3569#issuecomment-3698617368

   we have many jobs like rdd1->rdd2->rdd3-> s3
   
   rdd1 generated 500TB, and it holds there for 24 hours since rdd3 needs that 
many hours to be dumped to s3, and we have no way to release the reference of 
shuffle dependency object since rdd1->2->3 is a reference chain and rdd 3 is 
not release until the job finished
   
   i don't think non remote shuffle spark users care this feature as much, 
because a huge computing cluster can have 10s of 1000s of nodes, and tasks 
being spread everywhere will just amortize the disk pressure to a broad range 
of machines ..celeborn users do care about this, as we cannot have that many 
celeborn machines due to the cost constraints 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to