CodingCat commented on PR #3109: URL: https://github.com/apache/celeborn/pull/3109#issuecomment-2844012878
> I meant, why check for running stages, etc - we know it is going to fail- always abandon/delete ? this is a good question, there are some corner cases we need to consider e.g. stage 0, 1, 2, 0 -> 1, 0-> 2 after stage 2 finished reading some partitions of 0, 1 failed to fetch the same partitions, in this case, we should not delete shuffle output from stage 0, if 2 is still running, since there is possibility that 2 can continue reading without any problem but if 2 is not running, we can safely delete shuffle 0, but if shuffle 0 is failed to be fetched because commit failure, then we can safely delete it as well since both 1 and 2 will eventually fail -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
