yonatan-sevenai opened a new pull request, #21375:
URL: https://github.com/apache/datafusion/pull/21375
## Which issue does this PR close?
- Closes #21374
## Rationale for this change
When the SQL unparser encounters a Projection → Limit → Aggregate or
Projection → Sort → Aggregate plan shape where aggregate aliases are inlined
(no intermediate Projection between
Limit/Sort and Aggregate), it emits the aggregate expressions twice - once
in the outer SELECT and once inside a spurious derived subquery. The outer
SELECT also references columns that are
out of scope.
This plan shape doesn't occur from the SQL parser (which inserts an
intermediate Projection), but optimizers and plan builders can produce it.
## What changes are included in this PR?
datafusion/sql/src/unparser/plan.rs:
- reconstruct_select_statement now returns Result<bool> - true when it
found and claimed an Aggregate node for the current SELECT.
- In the Projection arm of select_to_sql_recursively, after claiming an
Aggregate, if the Projection's direct child is a Limit or Sort, fold its
clauses (LIMIT/OFFSET or ORDER BY/LIMIT) into
the current query and recurse into the child's input, skipping the
Limit/Sort node. This prevents the Limit/Sort arm from seeing already_projected
and wrapping everything in a spurious
derived subquery.
datafusion/sql/tests/cases/plan_to_sql.rs:
- roundtrip_aggregate_over_subquery — roundtrip test for the
parser-generated plan shape (Projection between Limit and Aggregate),
confirming it still works.
- roundtrip_subquery_aggregate_with_column_alias - roundtrip test for
SELECT id FROM (SELECT max(j1_id) FROM j1) AS c(id).
- test_unparse_aggregate_over_subquery_no_inner_proj - manually
constructed Projection → Limit → Aggregate plan, verifying the Limit is folded
into the outer SELECT.
- test_unparse_aggregate_no_outer_rename - manually constructed Projection
→ Aggregate with no outer rename, verifying aggregate aliases are preserved.
- test_unparse_aggregate_with_sort_no_inner_proj - manually constructed
Projection → Sort → Aggregate plan, verifying the Sort is folded into the outer
SELECT.
## Are these changes tested?
Yes. Five new tests covering:
1. Parser-generated roundtrip (regression guard)
2. Column alias roundtrip with subquery aggregate
3. Limit folding with manually constructed plan (was the primary bug)
4. Aggregate alias preservation without outer rename
5. Sort folding with manually constructed plan (same bug pattern)
## Are there any user-facing changes?
No API changes. The fix corrects SQL output for programmatically constructed
logical plans that have Projection → Limit → Aggregate or Projection → Sort →
Aggregate shapes without an intermediate Projection.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]