alkis commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1620875846


##########
src/main/thrift/parquet.thrift:
##########
@@ -1127,18 +1229,48 @@ struct FileMetaData {
    * are flattened to a list by doing a depth-first traversal.
    * The column metadata contains the path in the schema for that column which 
can be
    * used to map columns to nodes in the schema.
-   * The first element is the root **/
-  2: required list<SchemaElement> schema;
+   * The first element is the root
+   *
+   * PAR1: Required
+   * PAR3: Use schema_page
+   **/
+  2: optional list<SchemaElement> schema;
+
+  /** Page has BYTE_ARRAY data where each element is REQUIRED. 
+    *
+    * Each element is a serialized SchemaElement.  The order and content should
+    * have a one to one correspondence with schema.
+    */
+  10: optional MetadataPage schema_page;

Review Comment:
   Why isn't this `list<binary>` where each binary is a serialized 
`SchemaElement`?



##########
src/main/thrift/parquet.thrift:
##########
@@ -1115,6 +1189,34 @@ union EncryptionAlgorithm {
   2: AesGcmCtrV1 AES_GCM_CTR_V1
 }
 
+/**
+ * Embedded metadata page.
+ * 
+ * A metadata page is a data page used to store metadata about
+ * the data stored in the file. This is a key feature of PAR3
+ * footers which allow for deferred decoding of metadata.
+ *
+ * For common use cases the current recommendation is to use a 
+ * an encoding that supported random access but implementations may choose
+ * other configuration parameters if necessary. Implementations
+ * SHOULD consider allowing configurability per page to allow for end-users
+ * to optimize size vs compute trade-offs that make sense for their use-case.
+ *
+ * Statistics for Metadata pages SHOULD NOT be written.
+ *
+ * Structs of this type should never be written in PAR1.
+ */
+struct MetadataPage {
+   // A serialized page including metadata thrift header and data.
+   1: required binary page
+   // Optional compression applied to the page.
+   2: optional CompressionCodec compression

Review Comment:
   My feel is that most of the pages encoded in this form are going to be 
smallish, less than 1kb. For such small sizes, none of the general purpose 
compressors will do a good job at compressing.
   
   Are there any benchmarks where we can see the effectiveness compressing the 
above pages?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to