wgtmac commented on code in PR #197:
URL: https://github.com/apache/parquet-format/pull/197#discussion_r1317993757


##########
src/main/thrift/parquet.thrift:
##########
@@ -977,6 +1073,15 @@ struct ColumnIndex {
 
   /** A list containing the number of null values for each page **/
   5: optional list<i64> null_counts
+  /**
+    * Repetition and definition level histograms for the pages.
+    *
+    * This contains some redundancy with null_counts, however, to accommodate
+    * the widest range of readers both should be populated when either the max
+    * definition and repetition level meet the requirements specified in
+    * RepetitionDefinitionLevelHistogram.
+   **/
+  6: optional list<RepetitionDefinitionLevelHistogram> 
repetition_definition_level_histograms

Review Comment:
   Just want to add that usually the implementation does not directly expose 
thrift object to the end user and instead it wraps or converts to a 
language-native form. Using a nested list thrift definition may involve more 
conversion if a page-level histogram is required and in turn introduces more 
memory overhead.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to