adriangb opened a new pull request, #14297: URL: https://github.com/apache/datafusion/pull/14297
Currently pruning predicates may return `NULL` to indicate "this container should be included", thus using `NULL` as a *truthy* value. That is quite confusing, as explained in the various comments addressing it. Additionally this is a big inconvenience for anything using `PredicateRewriter` because you have to handle nulls yourself, i.e. if you pipe the result into a `WHERE` clause you get the wrong result (silently!!). The workaround is to wrap the expression returned by `PredicateRewriter` with `(<expr>) IS NOT FALSE` which makes `NULL` truthy. This has the unfortunate consequence of breaking down a simple binary expression into a [non sargable one](https://en.wikipedia.org/wiki/Sargable). This poses a problem for systems that may want to store statistics in a DMBS with indexes. For example, if I add an index on `col1_min` it can't be used because the `(...) IS NOT FALSE` prevents anything from being pushed down into indexes. This PR addresses both problems by introducing checks for nulls in the stats columns in the right places such that we can now promise that the predicates always return `true`. Since we make no promises about the produced predicate this should not be a breaking change. This does not read any extra columns and the null checks should be very cheap, so I do not expect this to have any performance impact on systems evaluating statistics in memory (like DataFusion does internally for parquet row group and page statistics). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org