Re: [PR] [CELEBORN-894] End to End Integrity Checks [celeborn]

via GitHub Sat, 14 Feb 2026 05:19:34 -0800


byteroll commented on PR #3261:
URL: https://github.com/apache/celeborn/pull/3261#issuecomment-3901908502


   > > From this link, 
https://docs.google.com/document/d/1YqK0kua-5rMufJw57kEIrHHGbLnAF9iXM5GdDweMzzg/edit?tab=t.0,
 in v2 design it says _"The checks would incorrectly fail when Spark chooses to 
read data only partially for example in case of LIMIT queries"_, the mitigation 
is _"Instead of doing the checks when the CelebornInputStream is closed, do the 
checks when we know that the stream has been fully read"_. How does it work? 
The crc32&bytes of written data will not match the read data.
   > 
   > It means to perform Integrity Checks only after the complete data has been 
read; if the data is not fully read, we simply bypass the checking process.
   
   Got it. Thanks for explanation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [CELEBORN-894] End to End Integrity Checks [celeborn]

Reply via email to