wgtmac commented on code in PR #216: URL: https://github.com/apache/parquet-format/pull/216#discussion_r1349706308
########## src/main/thrift/parquet.thrift: ########## @@ -216,13 +216,22 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count; /** - * Min and max values for the column, determined by its ColumnOrder. + * lower and upper bound values for the column, determined by its ColumnOrder. + * These may be the actual minimum and maximum values found on a column chunk, + * but can also be (more compact) values that do not exist on a column chunk. + * For example, instead of storing "Blart Versenwald III", a writer may set + * min_value="B", max_value="C". Such more compact values must still be valid + * values within the column's logical type. * * Values are encoded using PLAIN encoding, except that variable-length byte * arrays do not include a length prefix. */ 5: optional binary max_value; 6: optional binary min_value; + /** If true, max_value is the actual maximum value found on a column chunk **/ + 7: optional bool is_max_value_exact; + /** If true, min_value is the actual minimum value found on a column chunk **/ Review Comment: ```suggestion /** If true, min_value is the actual minimum value for a column */ ``` ########## src/main/thrift/parquet.thrift: ########## @@ -216,13 +216,22 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count; /** - * Min and max values for the column, determined by its ColumnOrder. + * lower and upper bound values for the column, determined by its ColumnOrder. + * These may be the actual minimum and maximum values found on a column chunk, + * but can also be (more compact) values that do not exist on a column chunk. + * For example, instead of storing "Blart Versenwald III", a writer may set + * min_value="B", max_value="C". Such more compact values must still be valid + * values within the column's logical type. Review Comment: ```suggestion * Lower and upper bound values for the column, determined by its ColumnOrder. * * These may be the actual minimum and maximum values found on a page or column * chunk, but can also be (more compact) values that do not exist on a page or * column chunk. For example, instead of storing "Blart Versenwald III", a writer * may set min_value="B", max_value="C". Such more compact values must still be * valid values within the column's logical type. ``` ########## src/main/thrift/parquet.thrift: ########## @@ -216,13 +216,22 @@ struct Statistics { /** count of distinct values occurring */ 4: optional i64 distinct_count; /** - * Min and max values for the column, determined by its ColumnOrder. + * lower and upper bound values for the column, determined by its ColumnOrder. + * These may be the actual minimum and maximum values found on a column chunk, + * but can also be (more compact) values that do not exist on a column chunk. + * For example, instead of storing "Blart Versenwald III", a writer may set + * min_value="B", max_value="C". Such more compact values must still be valid + * values within the column's logical type. * * Values are encoded using PLAIN encoding, except that variable-length byte * arrays do not include a length prefix. */ 5: optional binary max_value; 6: optional binary min_value; + /** If true, max_value is the actual maximum value found on a column chunk **/ Review Comment: ```suggestion /** If true, max_value is the actual maximum value for a column */ ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org