alkis commented on code in PR #242: URL: https://github.com/apache/parquet-format/pull/242#discussion_r1609011897
########## README.md: ########## @@ -107,12 +113,97 @@ start locations. More details on what is contained in the metadata can be found in the Thrift definition. Metadata is written after the data to allow for single pass writing. +This is especially useful when writing to backends such as S3. Readers are expected to first read the file metadata to find all the column chunks they are interested in. The columns chunks should then be read sequentially.  +### Parquet 3 + +Parquet 3 files have the following overall structure: + +``` +4-byte magic number "PAR1" +4-byte magic number "PAR3" + +<Column 1 Chunk 1 + Column Metadata> +<Column 2 Chunk 1 + Column Metadata> +... +<Column N Chunk 1 + Column Metadata> +<Column 1 Chunk 2 + Column Metadata> +<Column 2 Chunk 2 + Column Metadata> +... +<Column N Chunk 2 + Column Metadata> +... +<Column 1 Chunk M + Column Metadata> +<Column 2 Chunk M + Column Metadata> +... +<Column N Chunk M + Column Metadata> + +<File-level Column 1 Metadata v3> +... +<File-level Column N Metadata v3> + +File Metadata v3 +4-byte length in bytes of File Metadata v3 (little endian) +4-byte magic number "PAR3" + +File Metadata +4-byte length in bytes of File Metadata (little endian) +4-byte magic number "PAR1" +``` + +Unlike the legacy File Metadata, the File Metadata v3 is designed to be light-weight +to decode, regardless of the number of columns in the file. Individual column +metadata can be opportunistically decoded depending on actual needs. + +This file structure is backwards-compatible. Parquet 1 readers will read and +decode the legacy File Metadata in the file footer, while Parquet 3 readers +will notice the "PAR3" magic number just before the File Metadata and will +instead read and decode the File Metadata v3. Review Comment: The overhead of putting the new footer before the old is substantial especially for the wide column use case. Footers for those files are in the order of MBs and parsing them is on the critical path of reads of any rowgroup/colchunk. The design as is will end up with 2-3 round trips to object stores for all the cases we care to optimise. Those fetches are going to take more time than parsing the old footer, effectively eliminating any benefits of the optimisations to the metadata. `PAR3` are 4 arbitrary bytes as much as `{0x08, 0x93, 0xCE, 0x00}`. Except the latter have one additional property: they are are guaranteed to be ignored by thrift parsers along with what follows thrm. If you turn off the "oh my we are parsing thrift manually" alarm, this is not any more bad or scary than `PAR3`. ... There might be another way. We could put the new footer after the old. Thrift terminates a struct with a null byte so it *knows* where the end is. What I don't know is if the thrift parser will fail at parsing thrift with trailing bytes after the end of a struct. I might have time to try it tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
