etseidl commented on code in PR #250:
URL: https://github.com/apache/parquet-format/pull/250#discussion_r1620997989


##########
src/main/thrift/parquet.thrift:
##########
@@ -812,8 +837,20 @@ struct ColumnMetaData {
 
   /** Set of all encodings used for pages in this column chunk.
    * This information can be used to determine if all data pages are
-   * dictionary encoded for example **/
+   * dictionary encoded for example  
+   *
+   *  PAR1: Optional. May be deprecated in a future release in favor
+   *        serialized_encoding_stats.
+   *  PAR3: Don't populate.  Write serialized_page_encoding_stats.
+   **/
   13: optional list<PageEncodingStats> encoding_stats;
+  /** 
+    * Serialized page encoding stats.
+    *
+    * PAR1: Start populating after encoding_stats is deprecated.
+    * PAR3: Populate instead of encoding_stats.
+    */
+  17: optional binary serialized_encoding_stats

Review Comment:
   But I just got this implemented 😭 😉. Joking aside, is there any other use 
for this field beyond optimizing dictionary based queries?
   
   Edit: looking above, encodings is now deprecated, so this field replaces 
that and serves the purpose of @alkis's boolean field. So if we adopt the 
latter, encodings will need to remain required I think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to