alamb commented on code in PR #15405: URL: https://github.com/apache/datafusion/pull/15405#discussion_r2012559291
########## datafusion/physical-plan/src/aggregates/row_hash.rs: ########## @@ -981,28 +978,30 @@ impl GroupedHashAggregateStream { Ok(()) } - /// Emit all rows, sort them, and store them on disk. + /// Emit all intermediate aggregation states, sort them, and store them on disk. + /// This process helps in reducing memory pressure by allowing the data to be + /// read back with streaming merge. fn spill(&mut self) -> Result<()> { + // Emit and sort intermediate aggregation state let Some(emit) = self.emit(EmitTo::All, true)? else { return Ok(()); }; let sorted = sort_batch(&emit, self.spill_state.spill_expr.as_ref(), None)?; Review Comment: eventually it might make sense to have the spill manager handle sorting the runs too (so it could potentially merge multiple files into a single run to reduce fanout, etc ########## datafusion/physical-plan/src/aggregates/row_hash.rs: ########## @@ -981,28 +978,30 @@ impl GroupedHashAggregateStream { Ok(()) } - /// Emit all rows, sort them, and store them on disk. + /// Emit all intermediate aggregation states, sort them, and store them on disk. Review Comment: ❤️ ########## datafusion/physical-plan/src/spill.rs: ########## @@ -92,6 +50,10 @@ fn read_spill(sender: Sender<Result<RecordBatch>>, path: &Path) -> Result<()> { /// Spill the `RecordBatch` to disk as smaller batches /// split by `batch_size_rows` +#[deprecated( + since = "46.0.0", + note = "This method is deprecated. Use `SpillManager::spill_record_batch_by_size` instead." Review Comment: 👍 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org