comphead commented on code in PR #11218: URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676989856
########## datafusion/physical-plan/src/lib.rs: ########## @@ -852,6 +852,30 @@ pub fn spill_record_batches( Ok(writer.num_rows) } +/// Spill the `RecordBatch` to disk as smaller batches +/// split by `batch_size` +/// Return `total_rows` what is spilled +pub fn spill_record_batch_by_size( + batch: RecordBatch, + path: PathBuf, + schema: SchemaRef, + batch_size: usize, +) -> Result<usize, DataFusionError> { Review Comment: Exactly the idea behind is to make sub batches to help the consumer reading data using less memory. The same approach we use in `row_hash.rs` Ideally to be honest is to return stream `SendableBatchRecordStream` but this will require more efforts as the SMJ is not async for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org