Rachelint commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2286134471
> I think I'm not so familiar with the Emit::First and there is no block implementation done yet. Could we emit every block size of values we have? Something like Emit::First(block size). > > We have `emit_early_if_necessary` that do First(n) emission when condition met. > > ```rust > fn emit_early_if_necessary(&mut self) -> Result<()> { > if self.group_values.len() >= self.batch_size > && matches!(self.group_ordering, GroupOrdering::None) > && matches!(self.mode, AggregateMode::Partial) > && self.update_memory_reservation().is_err() > { > let n = self.group_values.len() / self.batch_size * self.batch_size; > let batch = self.emit(EmitTo::First(n), false)?; > self.exec_state = ExecutionState::ProducingOutput(batch); > } > Ok(()) > } > ``` > > If we emit every block size we accumulated, is it something similar to the block approach? If not, what is the difference? > > Upd: One difference I can think of is that in block approach, we have all the accumulated values, and we can optimize it based on **all the values** we have, while in Emit::First mode, we early emit partial values, therefore, we loss the change if we want to do optimization based on **all the values** 🤔 ? I think I got it, if we constantly emit first n when the data size bigger than , it is actually equal to blocked approach. But `emit first n` will be just triggered in some special cases in my knowledge: - Streaming aggr (the really constantly `emit first n` case, and I forced to disable blocked mode in this case) - In `Partial` operator if found the memory exceeded limit (`emit_early_if_necessary`) And in others, we need to poll to end, then `emit all` and use `slice` method to split it and return: - For example, in the `FinalPartitioned` operator - In the `Partial` operator when memory under the memory limit - other cases... And in such cases, blocked approach may be effecient for both memory and cpu as stated above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org