emkornfield commented on code in PR #197:
URL: https://github.com/apache/parquet-format/pull/197#discussion_r1319461836


##########
src/main/thrift/parquet.thrift:
##########
@@ -977,6 +1038,25 @@ struct ColumnIndex {
 
   /** A list containing the number of null values for each page **/
   5: optional list<i64> null_counts
+
+  /**
+   * Contains repetition level histograms for more details) for each page
+   * concatenated together.  The repetition_level_histogram field on
+   * SizeStatistics contains more details.
+   *
+   * When present the length should always be (number of pages *
+   * (max_repetition_level + 1)) elements in size.
+   *
+   * Element 0 is the first element of the histogram for the first page.
+   * Element (max_repetition_level + 1) is the first element of the histogram
+   * for the second page.

Review Comment:
   Agreed on moving away from the terminology "major" terminology and just 
assess if the comment matches what you want (which I think it does).
   
   The other "major" option would be something like `Element 0 is the first 
element of the histogram for the first page.  Element 1 is the first element of 
the histogram from the second page.  The element at index ``num_pages`` is the 
second element of the histogram for the first page`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to