kosiew opened a new pull request, #22075:
URL: https://github.com/apache/datafusion/pull/22075
## Which issue does this PR close?
* Part of #20788
## Rationale for this change
This PR adds a compact and reproducible test shape for the reported
high-memory query pattern involving:
* list column expansion via `unnest`
* row explosion
* regrouping with `GROUP BY`
* ordered aggregation using `array_agg(... ORDER BY ...)`
The goal is to isolate and document the execution shape before optimizer or
executor fixes are introduced. The reproducer is intentionally bounded so it
can run reliably in local and CI environments while still demonstrating the
problematic expansion pattern.
## What changes are included in this PR?
* Added a new benchmark:
* `benchmarks/sql_benchmarks/unnest_array_agg/benchmarks/q01.benchmark`
* Added SQLLogicTest coverage:
* `datafusion/sqllogictest/test_files/unnest_array_agg_repro.slt`
* Added a bounded synthetic workload that:
* creates list columns using `range`
* expands them with `unnest`
* regroups rows using `array_agg(val ORDER BY idx)`
* Added validation of the intermediate row expansion count.
* Captured `EXPLAIN VERBOSE` output for the reproducer, including:
* logical plan
* initial physical plan
* physical execution plan
* schema details for ordered aggregate state
* Added configurable benchmark scaling via:
* `UNNEST_ARRAY_AGG_ROWS`
* `UNNEST_ARRAY_AGG_LIST_LEN`
## Are these changes tested?
Yes.
This PR adds:
* SQLLogicTest coverage in:
* `datafusion/sqllogictest/test_files/unnest_array_agg_repro.slt`
* A benchmark reproducer in:
* `benchmarks/sql_benchmarks/unnest_array_agg/benchmarks/q01.benchmark`
The SLT verifies:
* row expansion counts
* ordered `array_agg` results
* `EXPLAIN VERBOSE` plan shape including `UnnestExec` and `AggregateExec`
## Are there any user-facing changes?
No user-facing changes. This PR only adds regression coverage and
benchmarking infrastructure for a specific query shape.
## LLM-generated code disclosure
This PR includes LLM-generated code and comments. All LLM-generated content
has been manually reviewed and tested.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]