yonatan-sevenai opened a new issue, #21098:
URL: https://github.com/apache/datafusion/issues/21098
### Describe the bug
When the SQL unparser encounters a SubqueryAlias node whose direct child is
an Aggregate (without an intermediate Projection), it flattens the subquery
into a simple table alias, losing the aggregate entirely.
# Root cause
The SubqueryAlias handler in select_to_sql_recursively
(datafusion/sql/src/unparser/plan.rs) calls
subquery_alias_inner_query_and_columns (which only unwraps Projection
children) and unparse_table_scan_pushdown (which only
handles TableScan/SubqueryAlias/Projection). When both return nothing useful
for an Aggregate child, the code recurses directly
into the Aggregate, merging its GROUP BY into the outer SELECT instead of
emitting a derived subquery.
# Additional context
This also affects other plan types that build their own SELECT clauses
(Window, Sort, Limit, Union) when directly wrapped by
SubqueryAlias.
### To Reproduce
Manually construct a logical plan with a join where the right side is
SubqueryAlias > Aggregate:
```sql
SELECT j1.j1_string FROM j1 JOIN (SELECT max(j2_id) AS max_id FROM j2) AS b
ON j1.j1_id = b.max_id
```
This bug doesn't manifest when the SQL is parsed from a string, as the
parser inserts a Projection between the SubqueryAlias and Aggregate, so the
roundtrip works. The bug only manifests with manually constructed plans where
SubqueryAlias directly wraps Aggregate.
```rust
let right_scan = table_scan(Some("j2"), &j2_schema, None)?.build()?;
let right_agg = LogicalPlanBuilder::from(right_scan)
.aggregate(vec![] as Vec<Expr>,
vec![max(col("j2.j2_id")).alias("max_id")])?
.build()?;
let right_subquery = subquery_alias(right_agg, "b")?;
let left_scan = table_scan(Some("j1"), &j1_schema, None)?.build()?;
let plan = LogicalPlanBuilder::from(left_scan)
.join(right_subquery, JoinType::Inner,
(vec![Column::from_qualified_name("j1.j1_id")],
vec![Column::from_qualified_name("b.max_id")]),
None)?
.project(vec![col("j1.j1_string")])?
.build()?;
let sql = Unparser::default().plan_to_sql(&plan)?.to_string();
```
## Current behavior
```sql
SELECT j1.j1_string FROM j1 INNER JOIN j2 AS b ON j1.j1_id = b.max_id
```
The aggregate subquery is completely dropped — (SELECT max(j2_id) AS max_id
FROM j2) AS b becomes just j2 AS b.
### Expected behavior
```sql
SELECT j1.j1_string FROM j1 INNER JOIN (SELECT max(j2.j2_id) AS max_id FROM
j2) AS b ON j1.j1_id = b.max_id
```
### Additional context
_No response_
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]