viirya commented on code in PR #9223:
URL: https://github.com/apache/arrow-datafusion/pull/9223#discussion_r1527197990
##########
datafusion/core/src/physical_optimizer/pruning.rs:
##########
@@ -326,28 +348,46 @@ pub trait PruningStatistics {
/// LiteralGuarantees are not satisfied
///
/// **Second Pass**: Evaluates the rewritten expression using the
-/// min/max/null_counts values for each column for each container. For any
+/// min/max/null_counts/row_counts values for each column for each container.
For any
/// container that this expression evaluates to `false`, it rules out those
/// containers.
///
-/// For example, given the predicate, `x = 5 AND y = 10`, if we know `x` is
-/// between `1 and 100` and we know that `y` is between `4` and `7`, the input
-/// statistics might look like
+///
+/// ### Example 1
+/// Given the predicate, `x = 5 AND y = 10`, if we know that for a given
container, `x` is
+/// between `1 and 100` and we know that `y` is between `4` and `7`, we know
nothing about
+/// the null count and row count of `x` and `y`, the input statistics might
look like:
///
/// Column | Value
/// -------- | -----
/// `x_min` | `1`
/// `x_max` | `100`
+/// `x_null_count` | `null`
+/// `x_row_count` | `null`
/// `y_min` | `4`
/// `y_max` | `7`
+/// `y_null_count` | `null`
+/// `y_row_count` | `null`
///
/// The rewritten predicate would look like
///
-/// `x_min <= 5 AND 5 <= x_max AND y_min <= 10 AND 10 <= y_max`
+/// ```sql
+/// CASE
+/// WHEN x_null_count = x_row_count THEN false
+/// ELSE x_min <= 5 AND 5 <= x_max
+/// END
+/// AND
+/// CASE
+/// WHEN y_null_count = y_row_count THEN false
+/// ELSE y_min <= 10 AND 10 <= y_max
+/// END
+/// ```
Review Comment:
I wonder why we need to put x_null_count and x_row_count into rewritten
predicate if we know nothing about them? Previous rewritten predicate looks
more correct/concise to me.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]