yonatan-sevenai opened a new pull request, #21099:
URL: https://github.com/apache/datafusion/pull/21099
…gregate
When the SQL unparser encountered a SubqueryAlias node whose direct child
was an Aggregate (or other clause-building plan like Window, Sort, Limit,
Union), it would flatten the subquery into a simple table alias, losing the
aggregate entirely.
For example, a plan representing:
SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id =
b.m
would unparse to:
SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m
dropping the MAX aggregate and the subquery.
Root cause: the SubqueryAlias handler in select_to_sql_recursively would
call subquery_alias_inner_query_and_columns (which only unwraps Projection
children) and unparse_table_scan_pushdown (which only handles
TableScan/SubqueryAlias/Projection). When both returned nothing useful for an
Aggregate child, the code recursed directly into the Aggregate, merging its
GROUP BY into the outer SELECT instead of wrapping it in a derived subquery.
The fix adds an early check: if the SubqueryAlias's direct child is a plan
type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit,
Union), emit it as a derived subquery via self.derive() with the alias always
attached, rather than falling through to the recursive path that would flatten
it.
## Which issue does this PR close?
- Closes #21098
## Rationale for this change
The SQL unparser silently drops subquery structure when a SubqueryAlias node
directly wraps an Aggregate (or Window, Sort, Limit, Union). For example, a
plan representing
```sql
SELECT ... FROM j1 JOIN (SELECT max(id) FROM j2) AS b ...
```
unparses to
```sql
SELECT ... FROM j1 JOIN j2 AS b ...
```
losing the aggregate entirely. This produces semantically incorrect SQL.
## What changes are included in this PR?
In the SubqueryAlias handler within select_to_sql_recursively
(`datafusion/sql/src/unparser/plan.rs`):
- Added an early check: if the SubqueryAlias's direct child is a plan type
that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union) and
cannot be reduced to a table scan, emit it as a derived subquery (SELECT ...)
AS alias via self.derive() instead of recursing into the child and flattening
it.
- Added a helper requires_derived_subquery() that identifies plan types
requiring this treatment.
## Are these changes tested?
Yes. A new test test_unparse_manual_join_with_subquery_aggregate is added
that constructs a SubqueryAlias > Aggregate plan (without an intermediate
Projection) and asserts the unparsed SQL preserves the MAX() aggregate function
call. This test fails without the fix. All current unparser tests succeed
without modification
## Are there any user-facing changes?
No.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]