Rachelint commented on PR #11943:
URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2288564387

   > > @2010YOUY01 After checking the codes about memory contorl, I think I got 
it.
   > > 
   > > * `emit_early_if_necessary` is used in `Partial`
   > > * and `spill_previous_if_necessary` is used in the final phases
   > > 
   > > They all serve for the spilling. And the logic may be like this:
   > > 
   > > * After reaching the memory limit, force the `Partial` to submit batches 
to `Final` as soon as possible
   > > * And the `Final` will spill them to disk for avoid oom
   > > * After all batches are submitted to `Final`, the `Final` merged the 
spilled batches and in-memory batches to get the final results (in streaming 
agg way, batches will be sorted before spilling).
   > 
   > Thanks, now I figured out the high-level idea of spilling in aggregation 
and how `emit` works in its implementation.
   > 
   > However there exists other code that does early emit in aggregation, and 
I'm still trying to figure out how they work, do you have any pointer for that? 
I'm guessing it's used in streaming aggregation or some pushed-down limits
   > 
   > 
https://github.com/apache/datafusion/blob/482ef4551a4828825da8deb29d222fa82e1cfaa9/datafusion/physical-plan/src/aggregates/row_hash.rs#L605-L611
   
   Yes, you are right, there are two early emission cases, one is for spilling 
mentioned above, and another here is about streaming. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to