alkis commented on code in PR #254: URL: https://github.com/apache/parquet-format/pull/254#discussion_r1706648829
########## ExtensionExamples.md: ########## @@ -0,0 +1,77 @@ +# Parquet extension examples + +To illustrate the applicability of the proposed specification we provide examples of fictional extensions to parquet and how migration can play out if/when the community decides to adopt them in the official specification. + +## Footer + +A variant of `FileMetaData` encoded in Flatbuffers is introduced. This variant is more performant and can scale to very wide tables, something that current Thrift `FileMetaData` struggles with. + +In its private form the footer of a Parquet file will look like so: + + N-1 bytes | Thrift compact protocol encoded FileMetadata (minus \0 thrift stop field) + 4 bytes | 08 FF FF 01 (long form header for 32767: binary) + 1-5 bytes | ULEB128(K+28) encoded size of the extension + K bytes | Flatbuffers representation (v0) of FileMetaData + 4 bytes | little-endian crc32(flatbuffer) + 4 bytes | little-endian size(flatbuffer) + 4 bytes | little-endian crc32(size(flatbuffer)) + 16 bytes | UUID1 + 1 byte | \0 (thrift stop field) + 4 bytes | PAR1 + +UUID1 is some UUID picked for this extension and it is used throughout (possibly internal) experimentation. It is put at the end to allow detection of the extension when parsed in reverse. The little-endian sizes and crc32s are also to the end to facilitate efficient parsing the footer in reverse without requiring parsing the Thrift compact protocol that precedes it. Review Comment: Switched to `some-UUID` and `some-other-UUID`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
