alamb commented on PR #13795: URL: https://github.com/apache/datafusion/pull/13795#issuecomment-2548414526
I thought more about this in the 🚿 and doubled checked the logic: TLDR is I think using `NOT` (as in this PR) is ok -- the rationale is that if evaluating `column_count = null_count` is null it means nothing is known about the null_counts. However, since `null AND ...` will still resolve to `false` if the `...` is false (aka we can prove the predicate is not true by other means), then the requirements of the pruning predicate will be satisfied https://github.com/apache/datafusion/blob/e665115893e6282d592df71657e9f5b5855d1617/datafusion/physical-optimizer/src/pruning.rs#L723-L726 So TLDR is upon more thought I think the theory behind this PR is sound ✅ I need to review the code more carefully and ensure we have a test that has unknown (unspecified) column count but the value can be proven true by other min/max ranges but otherwise it should be good to go. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org