yonatan-sevenai opened a new pull request, #21099:
URL: https://github.com/apache/datafusion/pull/21099

   …gregate
   
   When the SQL unparser encountered a SubqueryAlias node whose direct child 
was an Aggregate (or other clause-building plan like Window, Sort, Limit, 
Union), it would flatten the subquery into a simple table alias, losing the 
aggregate entirely.
   
   For example, a plan representing:
     SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = 
b.m
   
   would unparse to:
     SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m
   
   dropping the MAX aggregate and the subquery.
   
   Root cause: the SubqueryAlias handler in select_to_sql_recursively would 
call subquery_alias_inner_query_and_columns (which only unwraps Projection 
children) and unparse_table_scan_pushdown (which only handles 
TableScan/SubqueryAlias/Projection). When both returned nothing useful for an 
Aggregate child, the code recursed directly into the Aggregate, merging its 
GROUP BY into the outer SELECT instead of wrapping it in a derived subquery.
   
   The fix adds an early check: if the SubqueryAlias's direct child is a plan 
type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, 
Union), emit it as a derived subquery via self.derive() with the alias always 
attached, rather than falling through to the recursive path that would flatten 
it.
   
   ## Which issue does this PR close?
   - Closes #21098 
   
   ## Rationale for this change
   
   The SQL unparser silently drops subquery structure when a SubqueryAlias node 
directly wraps an Aggregate (or Window, Sort, Limit, Union). For example, a 
plan representing 
   ```sql 
   SELECT ... FROM j1 JOIN (SELECT max(id) FROM j2) AS b ...
   ``` 
   unparses to 
   ```sql
   SELECT ... FROM j1 JOIN j2 AS b ...
   ```
   losing the aggregate entirely. This produces semantically incorrect SQL.
   
   ## What changes are included in this PR?
   
   In the SubqueryAlias handler within select_to_sql_recursively 
(`datafusion/sql/src/unparser/plan.rs`):
     - Added an early check: if the SubqueryAlias's direct child is a plan type 
that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union) and 
cannot be reduced to a table scan, emit it as a derived subquery (SELECT ...) 
AS alias via self.derive() instead of recursing into the child and flattening 
it.
     - Added a helper requires_derived_subquery() that identifies plan types 
requiring this treatment.
   
   ## Are these changes tested?
   
   Yes. A new test test_unparse_manual_join_with_subquery_aggregate is added 
that constructs a SubqueryAlias > Aggregate plan (without an intermediate 
Projection) and asserts the unparsed SQL preserves the MAX() aggregate function 
call. This test fails without the fix. All current unparser tests succeed 
without modification
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to