nathanb9 opened a new pull request, #22319:
URL: https://github.com/apache/datafusion/pull/22319

   ## Which issue does this PR close?
   
   - Closes #22317.
   
   ## Rationale for this change
   
   Correlated scalar subqueries with ungrouped aggregates are decorrelated into 
joins. For unmatched outer rows, the rewritten join naturally produces NULLs on 
the right side, so DataFusion has compensation logic for aggregates that should 
return a non-NULL value on empty input.
   
   That compensation previously special-cased `count` by name. As a result, 
other aggregates with non-NULL empty-input results, such as `regr_count` and 
`approx_distinct`, incorrectly returned NULL after decorrelation.
   
   ## What changes are included in this PR?
   
   This PR updates decorrelation to use each aggregate UDF's `default_value()` 
instead of hard-coding `count`.
   
   It also adds empty-input defaults for:
   
   - `regr_count`: `UInt64(0)`
   - `approx_distinct`: `UInt64(0)`
   
   Regression coverage is added for correlated scalar subqueries using these 
aggregates in projection expressions and filters.
   
   ## Are these changes tested?
   
   Yes.
   
   ```bash
   cargo fmt --all
   cargo test -p datafusion-sqllogictest --test sqllogictests -- subquery.slt
   ```
   
   ## Are there any user-facing changes?
   
   Yes. Queries using `regr_count` or `approx_distinct` in correlated scalar 
subqueries now return `0` for unmatched outer rows instead of `NULL`, matching 
the aggregate behavior on empty input.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to