yonatan-sevenai opened a new issue, #21098:
URL: https://github.com/apache/datafusion/issues/21098

   ### Describe the bug
   
   When the SQL unparser encounters a SubqueryAlias node whose direct child is 
an Aggregate (without an intermediate Projection), it flattens the subquery 
into a simple table alias, losing the aggregate entirely.
   
   # Root cause
   
   The SubqueryAlias handler in select_to_sql_recursively 
(datafusion/sql/src/unparser/plan.rs) calls
   subquery_alias_inner_query_and_columns (which only unwraps Projection 
children) and unparse_table_scan_pushdown (which only
   handles TableScan/SubqueryAlias/Projection). When both return nothing useful 
for an Aggregate child, the code recurses directly
   into the Aggregate, merging its GROUP BY into the outer SELECT instead of 
emitting a derived subquery.
   
   
   # Additional context
   
   This also affects other plan types that build their own SELECT clauses 
(Window, Sort, Limit, Union) when directly wrapped by
   SubqueryAlias.
   
   
   ### To Reproduce
   
   Manually construct a logical plan with a join where the right side is 
SubqueryAlias > Aggregate:
   
   ```sql
   SELECT j1.j1_string FROM j1 JOIN (SELECT max(j2_id) AS max_id FROM j2) AS b 
ON j1.j1_id = b.max_id
   ```
   
   This bug doesn't manifest when the SQL is parsed from a string, as the 
parser inserts a Projection between the SubqueryAlias and Aggregate, so the 
roundtrip works. The bug only manifests with manually constructed plans where 
SubqueryAlias directly wraps Aggregate.
   
   
   ```rust
   let right_scan = table_scan(Some("j2"), &j2_schema, None)?.build()?;
   let right_agg = LogicalPlanBuilder::from(right_scan)
       .aggregate(vec![] as Vec<Expr>, 
vec![max(col("j2.j2_id")).alias("max_id")])?
       .build()?;
   let right_subquery = subquery_alias(right_agg, "b")?;
   
   let left_scan = table_scan(Some("j1"), &j1_schema, None)?.build()?;
   let plan = LogicalPlanBuilder::from(left_scan)
       .join(right_subquery, JoinType::Inner,
           (vec![Column::from_qualified_name("j1.j1_id")],
            vec![Column::from_qualified_name("b.max_id")]),
           None)?
       .project(vec![col("j1.j1_string")])?
       .build()?;
   
   let sql = Unparser::default().plan_to_sql(&plan)?.to_string();
   ```
   ## Current behavior
   
   ```sql
   SELECT j1.j1_string FROM j1 INNER JOIN j2 AS b ON j1.j1_id = b.max_id
   ``` 
   The aggregate subquery is completely dropped — (SELECT max(j2_id) AS max_id 
FROM j2) AS b becomes just j2 AS b.
   
   ### Expected behavior
   
   ```sql
   SELECT j1.j1_string FROM j1 INNER JOIN (SELECT max(j2.j2_id) AS max_id FROM 
j2) AS b ON j1.j1_id = b.max_id
   ```
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to