jiang13021 commented on PR #3534:
URL: https://github.com/apache/celeborn/pull/3534#issuecomment-3568756603

   > Nice catch. We did have the same problem, but I don’t know what scenarios 
will cause data loss, have you found the root cause? @jiang13021
   
   The reason for data loss in this issue is that the compressed data is 
corrupted. However, we are not yet clear about the root cause of the data 
corruption, and we only know that this is an occasional phenomenon. If the 
compressed body is corrupted, it will directly trigger stage rerun. However, 
this time it is the rarer case of header corruption. After merging this PR, 
header corruption will also trigger stage rerun.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to