pitrou commented on code in PR #50157:
URL: https://github.com/apache/arrow/pull/50157#discussion_r3412385042
##########
cpp/src/parquet/thrift_internal.h:
##########
@@ -290,6 +297,14 @@ static inline EncodedStatistics FromThrift(const
format::Statistics& stats) {
return out;
}
+static inline EncodedStatistics FromThrift(const format::Statistics& stats) {
Review Comment:
Is this overload still called somewhere? Should its call site be converted
to the new overload?
##########
cpp/src/parquet/schema.h:
##########
@@ -382,6 +382,19 @@ class PARQUET_EXPORT ColumnDescriptor {
return la ? GetSortOrder(la, pt) : GetSortOrder(converted_type(), pt);
}
+ // Whether ColumnOrder-governed min/max values have a supported ordering.
+ bool can_use_min_max() const {
+ switch (column_order().get_order()) {
+ case ColumnOrder::TYPE_DEFINED_ORDER:
+ return sort_order() != SortOrder::UNKNOWN;
+ case ColumnOrder::UNDEFINED:
+ return sort_order() == SortOrder::SIGNED;
Review Comment:
Can we comment on why we support this as a fallback?
`ColumnOrder::TYPE_DEFINED_ORDER` was apparently introduced in 2017 in the
Thrift definitions, the number of files that _don't_ expose a ColumnOrder
should be vanishingly small.
##########
cpp/src/parquet/metadata.cc:
##########
@@ -90,40 +90,64 @@ std::string ParquetVersionToString(ParquetVersion::type
ver) {
return "UNKNOWN";
}
+namespace {
+
+StatisticsMinMaxField GetStatisticsMinMaxField(const ColumnDescriptor& descr) {
+ switch (descr.column_order().get_order()) {
Review Comment:
Can we comment that this should be kept consistent with
`ColumnDescriptor::can_use_min_max`?
##########
cpp/src/parquet/schema.h:
##########
@@ -382,6 +382,19 @@ class PARQUET_EXPORT ColumnDescriptor {
return la ? GetSortOrder(la, pt) : GetSortOrder(converted_type(), pt);
}
+ // Whether ColumnOrder-governed min/max values have a supported ordering.
+ bool can_use_min_max() const {
+ switch (column_order().get_order()) {
+ case ColumnOrder::TYPE_DEFINED_ORDER:
+ return sort_order() != SortOrder::UNKNOWN;
+ case ColumnOrder::UNDEFINED:
+ return sort_order() == SortOrder::SIGNED;
Review Comment:
(in particular, it's not obvious a priori why `SIGNED` would be supported
but not `UNSIGNED`)
##########
cpp/src/parquet/schema.h:
##########
@@ -382,6 +382,19 @@ class PARQUET_EXPORT ColumnDescriptor {
return la ? GetSortOrder(la, pt) : GetSortOrder(converted_type(), pt);
}
+ // Whether ColumnOrder-governed min/max values have a supported ordering.
+ bool can_use_min_max() const {
+ switch (column_order().get_order()) {
+ case ColumnOrder::TYPE_DEFINED_ORDER:
+ return sort_order() != SortOrder::UNKNOWN;
+ case ColumnOrder::UNDEFINED:
+ return sort_order() == SortOrder::SIGNED;
Review Comment:
Something like:
```suggestion
case ColumnOrder::UNDEFINED:
// If there is no defined column order, the obsolete min and max
fields
// in the Statistics object are to be used, and they are always
sorted
// by signed comparison, so this has to be the column type's sort
order.
return sort_order() == SortOrder::SIGNED;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]