2010YOUY01 commented on code in PR #14868:
URL: https://github.com/apache/datafusion/pull/14868#discussion_r1971193787
##########
datafusion/physical-plan/src/spill.rs:
##########
@@ -59,13 +58,13 @@ pub(crate) fn read_spill_as_stream(
///
/// Returns total number of the rows spilled to disk.
pub(crate) fn spill_record_batches(
- batches: Vec<RecordBatch>,
+ batches: &[RecordBatch],
Review Comment:
I understand that it's better, however, we might prefer public API stability
and avoid this change, unless there is noticeable overhead.
##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -439,36 +440,35 @@ impl ExternalSorter {
// `self.in_mem_batches` is already taken away by the sort_stream, now
it is empty.
// We'll gradually collect the sorted stream into self.in_mem_batches,
or directly
// write sorted batches to disk when the memory is insufficient.
- let mut spill_writer: Option<IPCWriter> = None;
+ let mut spill_writer: Option<IPCStreamWriter> = None;
Review Comment:
There is a refactor to simplify this code in
https://github.com/apache/datafusion/pull/14823, we plan to merge in one day if
there is no objection.
I think after merging that refactor, we don't need any change here to
support `IPC Stream` writer, only implementing it inside
`spill_record_batches()` is enough.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]