emkornfield commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1622614076


##########
README.md:
##########
@@ -118,6 +118,51 @@ chunks they are interested in.  The columns chunks should 
then be read sequentia
 
  ![File 
Layout](https://raw.github.com/apache/parquet-format/master/doc/images/FileLayout.gif)
 
+ ### PAR3 File Footers
+
+ PAR3 file footer footer format designed to better support wider-schemas and 
more control
+ over the various footer size vs compute trade-offs.  Its format is as follows:
+   - Serialized Thrift FileMetadata Structure
+   - (Optional) 4 byte CRC32 of the serialized Thrift FileMetadata.
+   - 4-byte length in bytes (little endian) of all preceding elements in the 
footer.
+   - 4-byte little-endian flag field to indicate features that require special 
parsing of the footer.
+     Readers MUST raise an error if there is an unrecognized flag.  Current 
flags:
+
+     * 0x01 - Footer encryption enabled (when set the encryption information 
is written before 
+        FileMeta structure as in the PAR1 footer).
+     * 0x02 - CRC32 of FileMetadata Footer.

Review Comment:
   @wgtmac yeah I was imagining this for much fewer use-cases and I think for 
features that readers can detect as they read that they don't understand I 
think it is fine for it to happen lazily.
   
   > Put it differently the only feature we cannot encode inside the footer 
itself is if the footer is encrypted. For this it seems we can keep using a 
secondary magic number forever?
   
   @alkis at the very least compression.  If we switch to flatbuffers I believe 
they compress quite well (a lot of extra padding in integers)?  Would we then 
have a few more magic footers for the cross product of compression and and 
encryption?
   
   Again, I think there are only even a handful of imagined use-cases that this 
can be used which is originally why I had it as a single byte originally, IMO 
it is a small cost to pay for some potential flexibility. and is useful at 
least for encryption.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to