kosiew opened a new pull request, #20961: URL: https://github.com/apache/datafusion/pull/20961
## Which issue does this PR close? * Part of #20002 ## Rationale for this change `PushDownFilter` can spend a disproportionate amount of planning time inferring predicates across joins. One expensive path is `is_restrict_null_predicate`, which falls back to compiling and evaluating the predicate against a null-filled schema to decide whether a predicate is null-rejecting. For predicates that reference columns outside the join-key set, that evaluation cannot succeed with the synthetic null schema built for join columns only. In practice, callers already treat evaluation failures as non-restricting, but we still pay the full cost of the physical-expression compilation and evaluation path first. This change adds a cheap guard to detect predicates that reference columns outside the allowed join columns and returns `false` early. That preserves the existing behavior while avoiding unnecessary work in a hot optimizer path. ## What changes are included in this PR? This PR makes two focused changes: 1. In `is_restrict_null_predicate`, collect the join columns into a `HashSet` and add a fast-path check that verifies whether the predicate only references those columns. 2. If the predicate references any non-join column, return `Ok(false)` immediately instead of attempting null-evaluation. Additionally: * The evaluated join-column set is reused for the fallback `evaluate_expr_with_null_column` path. * `InferredPredicates::insert_inferred_predicate` is simplified to use `.unwrap_or(false)` when consuming `is_restrict_null_predicate`, which matches the prior effective behavior of treating errors as non-restricting. * A regression test is added for a predicate like `a > b`, where `b` is outside the join-key set, to verify the fast path returns `false`. ## Are these changes tested? Yes. A test case was added to cover the scenario where a predicate references a column outside the join key set: * `a > b` now explicitly verifies that `is_restrict_null_predicate` returns `false`. This exercises the new early-return path and protects against regressions in predicate analysis behavior. ## Are there any user-facing changes? No. This change is an internal optimizer performance improvement and does not change public APIs or intended query results. ## LLM-generated code disclosure This PR includes LLM-generated code and comments. All LLM-generated content has been manually reviewed and tested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
