appletreeisyellow commented on code in PR #9223:
URL: https://github.com/apache/arrow-datafusion/pull/9223#discussion_r1528987024
##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -318,6 +327,19 @@ pub trait PruningStatistics {
/// `x = 5 AND y = 10` | `x_min <= 5 AND 5 <= x_max AND y_min <= 10 AND 10 <=
y_max`
/// `x IS NULL` | `x_null_count > 0`
///
+/// In addition, for a given column `x`, the `x_null_count` and `x_row_count`
will
+/// be compared using a `CASE` statement to wrap the rewritten predicate to
handle
+/// the case where the column `x` is known to be all `NULL`s. Note this
+/// is different from knowing nothing about the column `x`, which confusingly
is
+/// encoded by returning `NULL` for the min/max values from
[`PruningStatistics::min_values`].
+///
+/// Original Predicate | Rewritten Predicate
+/// ------------------ | --------------------
+/// `x = 5` | `CASE WHEN x_null_count = x_row_count THEN false ELSE x_min <= 5
AND 5 <= x_max END`
+/// `x < 5` | `CASE WHEN x_null_count = x_row_count THEN false ELSE x_max < 5
END`
+/// `x = 5 AND y = 10` | `CASE WHEN x_null_count = x_row_count THEN false ELSE
x_min <= 5 AND 5 <= x_max END AND CASE WHEN y_null_count = y_row_count THEN
false ELSE y_min <= 10 AND 10 <= y_max END`
+/// `x IS NULL` | `CASE WHEN x_null_count = x_row_count THEN false ELSE
x_null_count > 0 END`
+///
Review Comment:
> Do you mean if `x_row_count` is available, the rewritten predicate will be
different?
The rewritten predicate will be the same no matter `x_row_count` is
available or not.
I think the current comment with a separate `x_row_count` example can be
misleading. I'll update the comment
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]