pitrou commented on code in PR #50157:
URL: https://github.com/apache/arrow/pull/50157#discussion_r3412385042


##########
cpp/src/parquet/thrift_internal.h:
##########
@@ -290,6 +297,14 @@ static inline EncodedStatistics FromThrift(const 
format::Statistics& stats) {
   return out;
 }
 
+static inline EncodedStatistics FromThrift(const format::Statistics& stats) {

Review Comment:
   Is this overload still called somewhere? Should its call site be converted 
to the new overload?



##########
cpp/src/parquet/schema.h:
##########
@@ -382,6 +382,19 @@ class PARQUET_EXPORT ColumnDescriptor {
     return la ? GetSortOrder(la, pt) : GetSortOrder(converted_type(), pt);
   }
 
+  // Whether ColumnOrder-governed min/max values have a supported ordering.
+  bool can_use_min_max() const {
+    switch (column_order().get_order()) {
+      case ColumnOrder::TYPE_DEFINED_ORDER:
+        return sort_order() != SortOrder::UNKNOWN;
+      case ColumnOrder::UNDEFINED:
+        return sort_order() == SortOrder::SIGNED;

Review Comment:
   Can we comment on why we support this as a fallback? 
`ColumnOrder::TYPE_DEFINED_ORDER` was apparently introduced in 2017 in the 
Thrift definitions, the number of files that _don't_ expose a ColumnOrder 
should be vanishingly small.



##########
cpp/src/parquet/metadata.cc:
##########
@@ -90,40 +90,64 @@ std::string ParquetVersionToString(ParquetVersion::type 
ver) {
   return "UNKNOWN";
 }
 
+namespace {
+
+StatisticsMinMaxField GetStatisticsMinMaxField(const ColumnDescriptor& descr) {
+  switch (descr.column_order().get_order()) {

Review Comment:
   Can we comment that this should be kept consistent with 
`ColumnDescriptor::can_use_min_max`?



##########
cpp/src/parquet/schema.h:
##########
@@ -382,6 +382,19 @@ class PARQUET_EXPORT ColumnDescriptor {
     return la ? GetSortOrder(la, pt) : GetSortOrder(converted_type(), pt);
   }
 
+  // Whether ColumnOrder-governed min/max values have a supported ordering.
+  bool can_use_min_max() const {
+    switch (column_order().get_order()) {
+      case ColumnOrder::TYPE_DEFINED_ORDER:
+        return sort_order() != SortOrder::UNKNOWN;
+      case ColumnOrder::UNDEFINED:
+        return sort_order() == SortOrder::SIGNED;

Review Comment:
   (in particular, it's not obvious a priori why `SIGNED` would be supported 
but not `UNSIGNED`)



##########
cpp/src/parquet/schema.h:
##########
@@ -382,6 +382,19 @@ class PARQUET_EXPORT ColumnDescriptor {
     return la ? GetSortOrder(la, pt) : GetSortOrder(converted_type(), pt);
   }
 
+  // Whether ColumnOrder-governed min/max values have a supported ordering.
+  bool can_use_min_max() const {
+    switch (column_order().get_order()) {
+      case ColumnOrder::TYPE_DEFINED_ORDER:
+        return sort_order() != SortOrder::UNKNOWN;
+      case ColumnOrder::UNDEFINED:
+        return sort_order() == SortOrder::SIGNED;

Review Comment:
   Something like:
   ```suggestion
         case ColumnOrder::UNDEFINED:
           // If there is no defined column order, the obsolete min and max 
fields
           // in the Statistics object are to be used, and they are always 
sorted
           // by signed comparison, so this has to be the column type's sort 
order.
           return sort_order() == SortOrder::SIGNED;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to