mapleFU commented on code in PR #43726:
URL: https://github.com/apache/arrow/pull/43726#discussion_r1742087940


##########
cpp/src/arrow/dataset/file_parquet.cc:
##########
@@ -366,9 +366,16 @@ std::optional<compute::Expression> 
ParquetFileFragment::EvaluateStatisticsAsExpr
     const parquet::Statistics& statistics) {
   auto field_expr = compute::field_ref(field_ref);
 
+  bool may_has_null = !statistics.HasNullCount() || statistics.null_count() > 
0;
+  bool must_has_null = statistics.HasNullCount() && statistics.null_count() > 
0;
   // Optimize for corner case where all values are nulls
-  if (statistics.num_values() == 0 && statistics.null_count() > 0) {
-    return is_null(std::move(field_expr));
+  if (statistics.num_values() == 0) {
+    if (must_has_null) {
+      return is_null(std::move(field_expr));
+    }
+    // If there are no values and no nulls, it might be empty or contains
+    // only null.
+    return std::nullopt;

Review Comment:
   I don't know. This might means "no-values", like an empty-page. I'm not sure 
should an empty page return `is_null`, it might be ok but a bit-weird for me( 
is_null for null or empty data)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to