mapleFU commented on code in PR #242: URL: https://github.com/apache/parquet-format/pull/242#discussion_r1603263217
########## README.md: ########## @@ -113,6 +119,55 @@ chunks they are interested in. The columns chunks should then be read sequentia  +### Parquet 3 + +Parquet 3 files have the following overall structure: + +``` +4-byte magic number "PAR1" +4-byte magic number "PAR3" +8-byte offset of File Metadata v3 +8-byte length of File Metadata v3 + +<Column 1 Chunk 1 + Column Metadata> +<Column 2 Chunk 1 + Column Metadata> +... +<Column N Chunk 1 + Column Metadata> +<Column 1 Chunk 2 + Column Metadata> +<Column 2 Chunk 2 + Column Metadata> +... +<Column N Chunk 2 + Column Metadata> +... +<Column 1 Chunk M + Column Metadata> +<Column 2 Chunk M + Column Metadata> +... +<Column N Chunk M + Column Metadata> + +<File-level Column 1 Metadata v3> +... +<File-level Column N Metadata v3> +File Metadata v3 + +File Metadata +4-byte length in bytes of file metadata (little endian) +4-byte magic number "PAR1" +``` + +The File Metadata v3 is designed to be light-weight to decode, regardless of +the number of columns in the file. Individual column metadata can be opportunistically +decoded depending on actual needs. + +This file structure is backwards-compatible. Parquet 1 readers will read the +legacy File Metadata in the file footer, while Parquet 3 readers will notice +the "PAR1PAR3" magic number (probably by reading the 24 first bytes in the file) +and will instead read the File Metadata v3. Review Comment: Yes sorry I mean a reader. This could works well if a metadata store having this info🤔But get a file store at header is not equal to current order -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
