comphead commented on code in PR #11218:
URL: https://github.com/apache/datafusion/pull/11218#discussion_r1676989856


##########
datafusion/physical-plan/src/lib.rs:
##########
@@ -852,6 +852,30 @@ pub fn spill_record_batches(
     Ok(writer.num_rows)
 }
 
+/// Spill the `RecordBatch` to disk as smaller batches
+/// split by `batch_size`
+/// Return `total_rows` what is spilled
+pub fn spill_record_batch_by_size(
+    batch: RecordBatch,
+    path: PathBuf,
+    schema: SchemaRef,
+    batch_size: usize,
+) -> Result<usize, DataFusionError> {

Review Comment:
   Exactly the idea behind is to make sub batches to help the consumer reading 
data using less memory. The same approach we use in `row_hash.rs`
   Ideally to be honest is to return stream `SendableBatchRecordStream` but 
this will require more efforts as the SMJ is not async for now. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to