bkietz commented on a change in pull request #10253:
URL: https://github.com/apache/arrow/pull/10253#discussion_r627547092



##########
File path: cpp/src/arrow/dataset/file_parquet.cc
##########
@@ -198,17 +198,22 @@ static util::optional<compute::Expression> 
ColumnChunkStatisticsAsExpression(
   auto maybe_min = min->CastTo(field->type());
   auto maybe_max = max->CastTo(field->type());
   if (maybe_min.ok() && maybe_max.ok()) {
-    auto col_min = maybe_min.MoveValueUnsafe();
-    auto col_max = maybe_max.MoveValueUnsafe();
-    if (col_min->Equals(col_max)) {
-      return compute::equal(std::move(field_expr), 
compute::literal(std::move(col_min)));
+    min = maybe_min.MoveValueUnsafe();
+    max = maybe_max.MoveValueUnsafe();
+    if (min->Equals(max)) {
+      return compute::equal(std::move(field_expr), 
compute::literal(std::move(min)));

Review comment:
       Equality expressions will be picked up by the ExtractKnownFieldValues 
pass, which assumes the field will `min` and never null (questionably correct)
   
   Longer term: I think we need to be more rigorous with the guarantee 
contract. Specifically: any guarantee should be usable as a filter, and in that 
case will evaluate to an array containing only `true` (and no nulls).
   
   | Data, Guarantee | Filter | Evaluation of guarantee on Data | |
   |---|---|---|---|
   | [2, 2], i32 == 2 | i32 == 2 | [true, true] | can simplify to `true` |
   | [2, null], i32 == 2 or is_null(i32) | i32 == 2 | [true, null] | the second 
row should be excluded from the filter |
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to