raunaqmorarka commented on code in PR #216:
URL: https://github.com/apache/parquet-format/pull/216#discussion_r1343751395


##########
src/main/thrift/parquet.thrift:
##########
@@ -216,13 +216,22 @@ struct Statistics {
    /** count of distinct values occurring */
    4: optional i64 distinct_count;
    /**
-    * Min and max values for the column, determined by its ColumnOrder.
+    * lower and upper bound values for the column, determined by its 
ColumnOrder.
+    * These may be the actual minimum and maximum values found on a column 
chunk,
+    * but can also be (more compact) values that do not exist on a column 
chunk.
+    * For example, instead of storing "Blart Versenwald III", a writer may set
+    * min_value="B", max_value="C". Such more compact values must still be 
valid
+    * values within the column's logical type.
     *
     * Values are encoded using PLAIN encoding, except that variable-length byte
     * arrays do not include a length prefix.
     */
    5: optional binary max_value;
    6: optional binary min_value;
+   /** If true, max_value is the actual maximum value found on a column chunk 
**/
+   7: optional bool is_max_value_exact;
+   /** If true, min_value is the actual minimum value found on a column chunk 
**/
+   8: optional bool is_min_value_exact;

Review Comment:
   I think these fields should be empty whenever max_value/min_value are 
themselves empty. Some writer implementations may choose to leave this empty 
even after populating min/max, in that case the readers should assume that the 
value is not exact for safety.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to