nathanb9 opened a new pull request, #22319: URL: https://github.com/apache/datafusion/pull/22319
## Which issue does this PR close? - Closes #22317. ## Rationale for this change Correlated scalar subqueries with ungrouped aggregates are decorrelated into joins. For unmatched outer rows, the rewritten join naturally produces NULLs on the right side, so DataFusion has compensation logic for aggregates that should return a non-NULL value on empty input. That compensation previously special-cased `count` by name. As a result, other aggregates with non-NULL empty-input results, such as `regr_count` and `approx_distinct`, incorrectly returned NULL after decorrelation. ## What changes are included in this PR? This PR updates decorrelation to use each aggregate UDF's `default_value()` instead of hard-coding `count`. It also adds empty-input defaults for: - `regr_count`: `UInt64(0)` - `approx_distinct`: `UInt64(0)` Regression coverage is added for correlated scalar subqueries using these aggregates in projection expressions and filters. ## Are these changes tested? Yes. ```bash cargo fmt --all cargo test -p datafusion-sqllogictest --test sqllogictests -- subquery.slt ``` ## Are there any user-facing changes? Yes. Queries using `regr_count` or `approx_distinct` in correlated scalar subqueries now return `0` for unmatched outer rows instead of `NULL`, matching the aggregate behavior on empty input. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
