emkornfield commented on code in PR #250: URL: https://github.com/apache/parquet-format/pull/250#discussion_r1622614076
########## README.md: ########## @@ -118,6 +118,51 @@ chunks they are interested in. The columns chunks should then be read sequentia  + ### PAR3 File Footers + + PAR3 file footer footer format designed to better support wider-schemas and more control + over the various footer size vs compute trade-offs. Its format is as follows: + - Serialized Thrift FileMetadata Structure + - (Optional) 4 byte CRC32 of the serialized Thrift FileMetadata. + - 4-byte length in bytes (little endian) of all preceding elements in the footer. + - 4-byte little-endian flag field to indicate features that require special parsing of the footer. + Readers MUST raise an error if there is an unrecognized flag. Current flags: + + * 0x01 - Footer encryption enabled (when set the encryption information is written before + FileMeta structure as in the PAR1 footer). + * 0x02 - CRC32 of FileMetadata Footer. Review Comment: @wgtmac yeah I was imagining this for much fewer use-cases and I think for features that readers can detect as they read that they don't understand I think it is fine for it to happen lazily. > Put it differently the only feature we cannot encode inside the footer itself is if the footer is encrypted. For this it seems we can keep using a secondary magic number forever? @alkis at the very least compression. If we switch to flatbuffers I believe they compress quite well (a lot of extra padding in integers)? Would we then have a few more magic footers for the cross product of compression and and encryption? Again, I think there are only even a handful of imagined use-cases that this can be used which is originally why I had it as a single byte originally, IMO it is a small cost to pay for some potential flexibility. and is useful at least for encryption. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
