alkis commented on code in PR #254:
URL: https://github.com/apache/parquet-format/pull/254#discussion_r1706648829


##########
ExtensionExamples.md:
##########
@@ -0,0 +1,77 @@
+# Parquet extension examples
+
+To illustrate the applicability of the proposed specification we provide 
examples of fictional extensions to parquet and how migration can play out 
if/when the community decides to adopt them in the official specification.
+
+## Footer
+
+A variant of `FileMetaData` encoded in Flatbuffers is introduced. This variant 
is more performant and can scale to very wide tables, something that current 
Thrift `FileMetaData` struggles with.
+
+In its private form the footer of a Parquet file will look like so:
+
+    N-1 bytes | Thrift compact protocol encoded FileMetadata (minus \0 thrift 
stop field)
+    4 bytes   | 08 FF FF 01 (long form header for 32767: binary)
+    1-5 bytes | ULEB128(K+28) encoded size of the extension
+    K bytes   | Flatbuffers representation (v0) of FileMetaData
+    4 bytes   | little-endian crc32(flatbuffer)
+    4 bytes   | little-endian size(flatbuffer)
+    4 bytes   | little-endian crc32(size(flatbuffer))
+    16 bytes  | UUID1
+    1 byte    | \0 (thrift stop field)
+    4 bytes   | PAR1
+
+UUID1 is some UUID picked for this extension and it is used throughout 
(possibly internal) experimentation. It is put at the end to allow detection of 
the extension when parsed in reverse. The little-endian sizes and crc32s are 
also to the end to facilitate efficient parsing the footer in reverse without 
requiring parsing the Thrift compact protocol that precedes it.

Review Comment:
   Switched to `some-UUID` and `some-other-UUID`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to