Rachelint commented on PR #11943:
URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2286134471

   > I think I'm not so familiar with the Emit::First and there is no block 
implementation done yet. Could we emit every block size of values we have? 
Something like Emit::First(block size).
   > 
   > We have `emit_early_if_necessary` that do First(n) emission when condition 
met.
   > 
   > ```rust
   >     fn emit_early_if_necessary(&mut self) -> Result<()> {
   >         if self.group_values.len() >= self.batch_size
   >             && matches!(self.group_ordering, GroupOrdering::None)
   >             && matches!(self.mode, AggregateMode::Partial)
   >             && self.update_memory_reservation().is_err()
   >         {
   >             let n = self.group_values.len() / self.batch_size * 
self.batch_size;
   >             let batch = self.emit(EmitTo::First(n), false)?;
   >             self.exec_state = ExecutionState::ProducingOutput(batch);
   >         }
   >         Ok(())
   >     }
   > ```
   > 
   > If we emit every block size we accumulated, is it something similar to the 
block approach? If not, what is the difference?
   > 
   > Upd: One difference I can think of is that in block approach, we have all 
the accumulated values, and we can optimize it based on **all the values** we 
have, while in Emit::First mode, we early emit partial values, therefore, we 
loss the change if we want to do optimization based on **all the values** 🤔 ?
   
   I think I got it, if we constantly emit first n when the data size bigger 
than , it is actually equal to blocked approach.
   
   But `emit first n` will be just triggered in some special cases in my 
knowledge:
   - Streaming aggr (the really constantly `emit first n` case, and I forced to 
disable blocked mode in this case)
   - In `Partial` operator  if found the memory exceeded limit 
(`emit_early_if_necessary`)
   
   And in others, we need to poll to end, then `emit all` and use `slice` 
method to split it and return:
   - For example, in the `FinalPartitioned` operator
   - In the `Partial` operator when memory under the memory limit
   - other cases...
   And in such cases, blocked approach may be effecient for both memory and cpu 
as stated above?
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to