alkis commented on code in PR #252:
URL: https://github.com/apache/parquet-format/pull/252#discussion_r1620328668
##########
src/main/thrift/parquet.thrift:
##########
@@ -242,43 +242,36 @@ struct SizeStatistics {
* All fields are optional.
*/
struct Statistics {
- /**
- * DEPRECATED: min and max value of the column. Use min_value and max_value.
- *
- * Values are encoded using PLAIN encoding, except that variable-length byte
- * arrays do not include a length prefix.
- *
- * These fields encode min and max values determined by signed comparison
- * only. New files should use the correct order for a column's logical type
- * and store the values in the min_value and max_value fields.
- *
- * To support older readers, these may be set when the column order is
- * signed.
- */
+ /* DEPRECATED: do not use */
1: optional binary max;
2: optional binary min;
/** count of null value in the column */
3: optional i64 null_count;
/** count of distinct values occurring */
4: optional i64 distinct_count;
/**
- * Lower and upper bound values for the column, determined by its
ColumnOrder.
+ * Only one pair of max_value/min_value, max1/min1, max2/min2, max4/min4,
+ * max8/min8 can be set. The pair is determined by the physical type of the
+ * column. Floating point values are bitcasted to integers. Variable length
+ * values are set in min_value/max_value.
Review Comment:
Rewritten this to be clearer.
##########
src/main/thrift/parquet.thrift:
##########
@@ -810,9 +803,13 @@ struct ColumnMetaData {
/** optional statistics for this column chunk */
12: optional Statistics statistics;
- /** Set of all encodings used for pages in this column chunk.
+ /**
+ * DEPRECATED: use is_fully_dict_encoded instead
Review Comment:
Agreed. I will create another PR for the Statistics change alone if we are
OK merging that now.
##########
src/main/thrift/parquet.thrift:
##########
@@ -242,43 +242,36 @@ struct SizeStatistics {
* All fields are optional.
*/
struct Statistics {
- /**
- * DEPRECATED: min and max value of the column. Use min_value and max_value.
- *
- * Values are encoded using PLAIN encoding, except that variable-length byte
- * arrays do not include a length prefix.
- *
- * These fields encode min and max values determined by signed comparison
- * only. New files should use the correct order for a column's logical type
- * and store the values in the min_value and max_value fields.
- *
- * To support older readers, these may be set when the column order is
- * signed.
- */
+ /* DEPRECATED: do not use */
1: optional binary max;
2: optional binary min;
/** count of null value in the column */
3: optional i64 null_count;
/** count of distinct values occurring */
4: optional i64 distinct_count;
/**
- * Lower and upper bound values for the column, determined by its
ColumnOrder.
+ * Only one pair of max_value/min_value, max1/min1, max2/min2, max4/min4,
+ * max8/min8 can be set. The pair is determined by the physical type of the
+ * column. Floating point values are bitcasted to integers. Variable length
+ * values are set in min_value/max_value.
+ *
+ * Min and Max are the lower and upper bound values for the column,
+ * respectively, as determined by its ColumnOrder.
*
* These may be the actual minimum and maximum values found on a page or
column
* chunk, but can also be (more compact) values that do not exist on a page
or
* column chunk. For example, instead of storing "Blart Versenwald III", a
writer
* may set min_value="B", max_value="C". Such more compact values must
still be
* valid values within the column's logical type.
- *
- * Values are encoded using PLAIN encoding, except that variable-length byte
- * arrays do not include a length prefix.
*/
5: optional binary max_value;
6: optional binary min_value;
/** If true, max_value is the actual maximum value for a column */
7: optional bool is_max_value_exact;
/** If true, min_value is the actual minimum value for a column */
8: optional bool is_min_value_exact;
+ 9: optional i64 max8;
Review Comment:
Yes I removed them because they provide little benefit and do not justify
the added complexity.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]