viirya commented on code in PR #9223: URL: https://github.com/apache/arrow-datafusion/pull/9223#discussion_r1527197640
########## datafusion/core/src/physical_optimizer/pruning.rs: ########## @@ -318,6 +327,19 @@ pub trait PruningStatistics { /// `x = 5 AND y = 10` | `x_min <= 5 AND 5 <= x_max AND y_min <= 10 AND 10 <= y_max` /// `x IS NULL` | `x_null_count > 0` /// +/// In addition, for a given column `x`, the `x_null_count` and `x_row_count` will +/// be compared using a `CASE` statement to wrap the rewritten predicate to handle +/// the case where the column `x` is known to be all `NULL`s. Note this +/// is different from knowing nothing about the column `x`, which confusingly is +/// encoded by returning `NULL` for the min/max values from [`PruningStatistics::min_values`]. +/// +/// Original Predicate | Rewritten Predicate +/// ------------------ | -------------------- +/// `x = 5` | `CASE WHEN x_null_count = x_row_count THEN false ELSE x_min <= 5 AND 5 <= x_max END` +/// `x < 5` | `CASE WHEN x_null_count = x_row_count THEN false ELSE x_max < 5 END` +/// `x = 5 AND y = 10` | `CASE WHEN x_null_count = x_row_count THEN false ELSE x_min <= 5 AND 5 <= x_max END AND CASE WHEN y_null_count = y_row_count THEN false ELSE y_min <= 10 AND 10 <= y_max END` +/// `x IS NULL` | `CASE WHEN x_null_count = x_row_count THEN false ELSE x_null_count > 0 END` +/// Review Comment: Hmm, I'm confused by this section of rewritten predicate. These predicates are the same as above, but their rewritten predicates are different. Do you mean if `x_row_count` is available, the rewritten predicate will be different? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org