emkornfield commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1621075200


##########
README.md:
##########
@@ -118,6 +118,51 @@ chunks they are interested in.  The columns chunks should 
then be read sequentia
 
  ![File 
Layout](https://raw.github.com/apache/parquet-format/master/doc/images/FileLayout.gif)
 
+ ### PAR3 File Footers
+
+ PAR3 file footer footer format designed to better support wider-schemas and 
more control
+ over the various footer size vs compute trade-offs.  Its format is as follows:
+   - Serialized Thrift FileMetadata Structure
+   - (Optional) 4 byte CRC32 of the serialized Thrift FileMetadata.
+   - 4-byte length in bytes (little endian) of all preceding elements in the 
footer.
+   - 4-byte little-endian flag field to indicate features that require special 
parsing of the footer.
+     Readers MUST raise an error if there is an unrecognized flag.  Current 
flags:
+
+     * 0x01 - Footer encryption enabled (when set the encryption information 
is written before 
+        FileMeta structure as in the PAR1 footer).
+     * 0x02 - CRC32 of FileMetadata Footer.

Review Comment:
   So for context I want from 8 bits to 64 bits and I think 32 is a reasonable 
default.  The intent of this particular bitmap is to indicate to consumers that 
there is backwards incompatible change.  I think if we had this originally we 
wouldn't of used a different magic number.  Forward looking I imagine things 
that might go here if we chose to pursue them:
   1.  Allowing non-continguous pages (i.e. making offset index mandatory).
   2.  Compressing the footer as a whole block (might be beneficial if we get 
to flatbuffers).
   3. Changing to flatbuffers in the future.
   
   I would hope these would be relatively rare, and the last flag of this 
bitmap can always be reserved to indicate yet another bitmap.
   
   @wgtmac what use-cases where you thinking of?
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to