adriangb commented on code in PR #13795:
URL: https://github.com/apache/datafusion/pull/13795#discussion_r1888777746
##########
datafusion/physical-optimizer/src/pruning.rs:
##########
@@ -287,7 +287,12 @@ pub trait PruningStatistics {
/// predicate can never possibly be true). The container can be pruned
(skipped)
/// entirely.
///
-/// Note that in order to be correct, `PruningPredicate` must return false
+/// While `PruningPredicate` will never return a `NULL` value, the
+/// rewritten predicate (as returned by `build_predicate_expression` and used
internally
+/// by `PruningPredicate`) may evaluate to `NULL` when some of the min/max
values
+/// or null / row counts are not known.
Review Comment:
This has always been true and is also clarified in the same docstring lower
down, I just wanted to add it here again since it's caused confusion in the
past (even for @alamb !):
https://github.com/apache/datafusion/blob/f4e65d2d9711ed097982d2fbde4191c402c05023/datafusion/physical-optimizer/src/pruning.rs#L300-L316
The difference now is that if the null or row count is null we will also
return null in the case where we can't use the min/max stats to prove that the
file can be pruned.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]