byteroll commented on PR #3261: URL: https://github.com/apache/celeborn/pull/3261#issuecomment-3901908502
> > From this link, https://docs.google.com/document/d/1YqK0kua-5rMufJw57kEIrHHGbLnAF9iXM5GdDweMzzg/edit?tab=t.0, in v2 design it says _"The checks would incorrectly fail when Spark chooses to read data only partially for example in case of LIMIT queries"_, the mitigation is _"Instead of doing the checks when the CelebornInputStream is closed, do the checks when we know that the stream has been fully read"_. How does it work? The crc32&bytes of written data will not match the read data. > > It means to perform Integrity Checks only after the complete data has been read; if the data is not fully read, we simply bypass the checking process. Got it. Thanks for explanation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
