danielcweeks commented on PR #535:
URL: https://github.com/apache/parquet-format/pull/535#issuecomment-3613658184

   After thinking this through a little more, I think we should more clearly 
define what each "versioned identifier" means and clearly articulate under what 
conditions it would change.  For example:
   
   ## Magic Number `PAR1`:  
   - What it means: Indicates that footer is still thrift compatible with 
Parquet V1 format and is expected to be parseable by any existing Parquet V1 
client.  
   - When it changes:  the footer is changed in an incompatible way with the 
thrift definition or the footer is substantively changed in a way that older 
clients should not even attempt to read the file.
   - What it solves: helps determine whether a file/footer is 
parquet/corrupt/other or substantively different between versions.
   - Example: replacing the footer with FlatBuffers representation in an 
incompatible way.
   
   ## Footer Version Number
   - What does it mean: largely redundant with PAR1
   - When it changes: (same as magic number)
   - What it solves: if the footer is stored somewhere outside of the file 
(e.g. a cache or supplied via some other mechanism)
   - Example: the footer is stored in a high-performance cache and keyed by 
path for faster pruning.  No magic number is available for the reader.
   
   ## What changes have been made _without version updates_:
   1. [backward __incompatible__] New compression codecs:  brotli, zstandard, 
etc.
   2. [backward __incompatible__] New data types: variant, geo types, etc.
   3. [backward compatible] Addition of Page indexes
   4. [backward compatible] Updates to Column Stats
   
   
   Given the incompatible changes like addition of codecs without a version 
change, it's confusing as to why the addition of encodings would require a 
version change.   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to