sahibamatta commented on PR #1866: URL: https://github.com/apache/incubator-uniffle/pull/1866#issuecomment-2219350891
> When the write operation fails and throws an exception, we need to make the `commitShuffle` fail as well. According to your current approach, the `commitShuffle` will succeed regardless of whether the write operation is successful or not, which makes `commitShuffle` meaningless. The purpose of `commitShuffle` is to ensure that all events are successfully written in sync, with no residual shuffle data left in memory, before starting the downstream read stage. Could you help me explain the fix expectation for the issue? The issue states that the `commitShuffle` will get stuck forever if the exception occurs in flush process i.e. `processFlushEvent` [here](<img width="879" alt="image" src="https://github.com/apache/incubator-uniffle/assets/26037835/bd43e532-cc2c-43c2-b160-5af4edf28bd0">). I can see that there's a `while(true)` loop, through which the code only exits, when the committed block ids is found. In case the block ids is not found, it runs forever in the loop or until timeout happens. Do you think it makes sense to alter the looping part (should remove while(true)) or maybe when the exception gets thrown from `ShuffleFlushManager - storageManager.write` part, throw a certain kind of exception which is non-retryable. Do you think bucketing exception as retryable/non-retryable will make sense in this scenario? Also, let me know if we can connect and discuss/brainstorm it together? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
