2010YOUY01 commented on PR #11943: URL: https://github.com/apache/datafusion/pull/11943#issuecomment-2287796117
> > I think I'm not so familiar with the Emit::First and there is no block implementation done yet. Could we emit every block size of values we have? Something like Emit::First(block size). > > We have `emit_early_if_necessary` that do First(n) emission when condition met. > > ```rust > > fn emit_early_if_necessary(&mut self) -> Result<()> { > > if self.group_values.len() >= self.batch_size > > && matches!(self.group_ordering, GroupOrdering::None) > > && matches!(self.mode, AggregateMode::Partial) > > && self.update_memory_reservation().is_err() > > { > > let n = self.group_values.len() / self.batch_size * self.batch_size; > > let batch = self.emit(EmitTo::First(n), false)?; > > self.exec_state = ExecutionState::ProducingOutput(batch); > > } > > Ok(()) > > } > > ``` > > > > > > > > > > > > > > > > > > > > > > > > If we emit every block size we accumulated, is it something similar to the block approach? If not, what is the difference? > > Upd: One difference I can think of is that in block approach, we have all the accumulated values, and we can optimize it based on **all the values** we have, while in Emit::First mode, we early emit partial values, therefore, we loss the change if we want to do optimization based on **all the values** 🤔 ? > > Ok, I think I got it now, if we constantly `emit first n` when the group len just equal to `batch size`, it is actually equal to blocked approach. > > But `emit first n` will be just triggered in some special cases in my knowledge: > > * Streaming aggr (the really constantly `emit first n` case, and I forced to disable blocked mode in this case) > * In `Partial` operator if found the memory exceeded limit (`emit_early_if_necessary`) > > And in others, we need to poll to end, then `emit all` and use `slice` method to split it and return now: > > * For example, in the `FinalPartitioned` operator > * In the `Partial` operator when memory under the memory limit > * other cases... > > And in such cases, blocked approach may be effecient for both memory and cpu as stated above? I still don't understand this `FirstBlocks` variant, if partial aggregate runs out of memory, output first N blocks and let final aggregation process them first looks like won't help much regarding total memory usage (since it's high cardinality aggregation) Is it related to the spilling implementation? I will check it later. Also thanks for pushing this forward, I think this approach is promising for performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org