danielcweeks commented on PR #535: URL: https://github.com/apache/parquet-format/pull/535#issuecomment-3613658184
After thinking this through a little more, I think we should more clearly define what each "versioned identifier" means and clearly articulate under what conditions it would change. For example: ## Magic Number `PAR1`: - What it means: Indicates that footer is still thrift compatible with Parquet V1 format and is expected to be parseable by any existing Parquet V1 client. - When it changes: the footer is changed in an incompatible way with the thrift definition or the footer is substantively changed in a way that older clients should not even attempt to read the file. - What it solves: helps determine whether a file/footer is parquet/corrupt/other or substantively different between versions. - Example: replacing the footer with FlatBuffers representation in an incompatible way. ## Footer Version Number - What does it mean: largely redundant with PAR1 - When it changes: (same as magic number) - What it solves: if the footer is stored somewhere outside of the file (e.g. a cache or supplied via some other mechanism) - Example: the footer is stored in a high-performance cache and keyed by path for faster pruning. No magic number is available for the reader. ## What changes have been made _without version updates_: 1. [backward __incompatible__] New compression codecs: brotli, zstandard, etc. 2. [backward __incompatible__] New data types: variant, geo types, etc. 3. [backward compatible] Addition of Page indexes 4. [backward compatible] Updates to Column Stats Given the incompatible changes like addition of codecs without a version change, it's confusing as to why the addition of encodings would require a version change. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
