[GitHub] [parquet-format] pitrou commented on pull request #126: PARQUET-1539: Clarify CRC checksum in page header

GitBox Tue, 13 Dec 2022 01:51:03 -0800


pitrou commented on PR #126:
URL: https://github.com/apache/parquet-format/pull/126#issuecomment-1348081137


   @bbraams @gszadovszky Could you explain why the spec's wording is so complex?
   
   It seems to me that the CRC is basically computed over the entire serialized 
data exactly as it's written to disk (after optional compression and 
encryption, and including the rep/def levels area that's prepended to the 
actual data). But the wording makes it seem that special care is needed to 
accumulate the CRC over different pieces of data, which may scare implementors 
(context: https://github.com/apache/arrow/pull/14351 ).
   
   Am I right in my interpretation above?
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [parquet-format] pitrou commented on pull request #126: PARQUET-1539: Clarify CRC checksum in page header

Reply via email to