Dandandan opened a new pull request, #23206: URL: https://github.com/apache/datafusion/pull/23206
## Which issue does this PR close? <!-- No issue filed; self-contained window change. Happy to open one if preferred. --> - Closes #. ## Rationale for this change `WindowAggExec` (the non-streaming window operator, used when a frame ends in `UNBOUNDED FOLLOWING`, a UDWF lacks bounded execution, etc.) buffers the entire input, computes all window columns, and then emits the result as **one `RecordBatch` sized to the whole input**. That forces every downstream operator which doesn't internally coalesce (sort ingest, joins, the client, …) to hold a single batch covering all rows at once — unlike `AggregateExec` (`row_hash.rs`) and `BoundedWindowAggExec`, which both honor `batch_size`. ## What changes are included in this PR? - `WindowAggStream` now stores the fully-computed result and emits it in `batch_size`-row slices across polls. Slicing is zero-copy (`RecordBatch::slice` adjusts offset/length over shared buffers), so this adds **no** per-row work and no extra copy — it only bounds the batch each downstream operator must hold. - `batch_size` is read from the session config in `execute` (before `context` is moved into the child) and clamped to at least 1. Scope note: the window computation itself is unchanged, so this does **not** reduce `WindowAggExec`'s own peak memory (it still buffers all input + the concatenated copy). That is a separate, larger concern; this PR only stops forcing a mega-batch onto downstream consumers. ## Are these changes tested? Yes: - A new unit test asserts a 10-row result is emitted as 4/4/2-row chunks with `batch_size = 4`, and that the running-count window column is unaffected by chunking. - The full `window` sqllogictest suite passes (all 6 files). Plan-shape tests are unaffected (only output batching changes, not the plan). ## Are there any user-facing changes? No behavior change beyond output batch sizing (results and ordering are identical); `WindowAggExec` now honors the configured `batch_size` like other operators. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
