wirybeaver commented on issue #22946:
URL: https://github.com/apache/datafusion/issues/22946#issuecomment-4713690458

   One tradeoff worth making explicit for this feature request: the previous 
whole-input `WindowAggExec` model may still be the fastest path for 
small/medium inputs when memory is sufficient, because it concatenates once, 
evaluates over larger contiguous batches, and emits fewer output batches.
   
   The proposed spill-oriented approach optimizes for a different failure mode: 
large/skewed inputs and memory-limited execution.
   
   ```text
   Current upstream model:
     buffer all input -> concat all input -> compute all partitions -> emit once
   
   Spill-oriented model:
     buffer one active partition -> spill it if needed -> compute completed 
partition -> emit partition output
   ```
   
   If we want to preserve the existing fully in-memory fast path, one possible 
design is to keep the current `WindowAggExec` behavior and introduce a separate 
operator such as `SpillingWindowAggExec`, selected by the planner/config when 
spill support is desired.
   
   I am also open to exploring whether the spill/streaming work should be 
integrated with `BoundedWindowAggExec`, especially for bounded frames as 
mentioned above. My hesitation is that `BoundedWindowAggExec` already has a 
specialized in-memory state/pruning model, so disk-backed state there likely 
deserves a separate focused design rather than being mixed into the initial 
spill PR.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to