yonatan-sevenai opened a new issue, #21374:
URL: https://github.com/apache/datafusion/issues/21374

   ### Describe the bug
   
   When the SQL unparser encounters a Projection → Limit → Aggregate (or 
Projection → Sort → Aggregate) plan shape where the Aggregate's aliases are 
inlined (no intermediate Projection between
     Limit and Aggregate), it emits the aggregate expressions twice — once in 
the outer SELECT and once inside a spurious derived subquery.
   
     This happens because:
     1. reconstruct_select_statement reaches through the Limit/Sort via 
`find_agg_node_within_select` to claim the Aggregate, setting the SELECT items 
and GROUP BY
     2. When recursion then hits the Limit/Sort arm, it sees `already_projected 
== true` and wraps everything below in a derived subquery
     3. The Aggregate ends up emitted in both the outer SELECT and inside the 
derived subquery
     
   PR with fix coming momentarily
   
   ### To Reproduce
   
   1. Parser-roundtrip query (works correctly because the parser inserts an 
intermediate Projection)
   
   ```
     SELECT __agg_0 AS "min(j1_id)", __agg_1 AS "max(j1_id)"
     FROM (
       SELECT min(j1_rename) AS __agg_0, max(j1_rename) AS __agg_1
       FROM (SELECT j1_id AS j1_rename FROM j1) AS bla
       LIMIT 20
     )
   
   ```
     This roundtrips correctly because the parser produces a plan with a 
Projection between the Limit and 
   Aggregate:
   
   ```
     Projection: __agg_0 AS min(j1_id), __agg_1 AS max(j1_id)
       Limit: skip=0, fetch=20
         Projection: min(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS 
__agg_1  ← parser inserts this
           Aggregate: groupBy=[[]], aggr=[[min(bla.j1_rename), 
max(bla.j1_rename)]]
             SubqueryAlias: bla
   ```
   
     2. Manually constructed plan (exposes the bug)
   
     When the aliases are inlined into the Aggregate (no intermediate 
Projection), as optimizers or plan builders may produce:
   
   ```
     Projection: __agg_0 AS max1(j1_id), __agg_1 AS max2(j1_id)
       Limit: skip=0, fetch=20
         Aggregate: groupBy=[[]], aggr=[[max(bla.j1_rename) AS __agg_0, 
max(bla.j1_rename) AS __agg_1]]
           SubqueryAlias: bla
             Projection: j1.j1_id AS j1_rename
               TableScan: j1
   
   ```
     Actual output:
    ```
    SELECT max(bla.j1_rename) AS "max1(j1_id)", max(bla.j1_rename) AS 
"max2(j1_id)"
     FROM (SELECT max(bla.j1_rename) AS __agg_0, max(bla.j1_rename) AS __agg_1
           FROM (SELECT j1.j1_id AS j1_rename FROM j1) AS bla LIMIT 20)
   
   ```
     The aggregate is emitted twice and the outer SELECT references 
bla.j1_rename which is out of scope.
   
     3. Same bug with Sort instead of Limit
   
   ```
     Projection: __agg_0 AS max1(j1_id)
       Sort: __agg_0 ASC
         Aggregate: groupBy=[[]], aggr=[[max(bla.j1_rename) AS __agg_0]]
           SubqueryAlias: bla
   
   ```
     Produces the same spurious derived subquery pattern.
   
   ### Expected behavior
   
   The unparser should emit a single SELECT statement:
   
    ```
    SELECT max(bla.j1_rename) AS "max1(j1_id)", max(bla.j1_rename) AS 
"max2(j1_id)"
     FROM (SELECT j1.j1_id AS j1_rename FROM j1) AS bla
     LIMIT 20
   
   ```
     When the Projection claims an Aggregate by reaching through a Limit or 
Sort, those intermediate nodes should be folded into the same SELECT statement 
rather than triggering a derived
     subquery.
   
   ### Additional context
   
    The fix is to detect in the Projection arm when 
`reconstruct_select_statement `found an Aggregate and the Projection's direct 
child is a Limit or Sort. In that case, fold the Limit/Sort
     clauses (LIMIT/OFFSET/ORDER BY) into the current query and recurse into 
the Limit/Sort's child, skipping the node so it never reaches the 
already_projected guard.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to