2010YOUY01 commented on code in PR #17163:
URL: https://github.com/apache/datafusion/pull/17163#discussion_r2286915308
##########
datafusion/physical-plan/src/sorts/streaming_merge.rs:
##########
@@ -88,6 +88,10 @@ pub struct StreamingMergeBuilder<'a> {
fetch: Option<usize>,
reservation: Option<MemoryReservation>,
enable_round_robin_tie_breaker: bool,
+ /// Ratio of memory used by cursor batch to the original input RecordBatch.
+ /// Used in `get_reserved_byte_for_record_batch_size` to estimate required
memory for merge phase.
+ /// Only passed when constructing MultiLevelMergeBuilder
Review Comment:
```suggestion
/// Only passed when constructing MultiLevelMergeBuilder
///
/// A cursor is an interface for comparing entries between two batches.
/// It includes the original batch and the sort keys converted to Arrow
Row format
/// for faster comparison. See `cursor.rs` for more details.
```
##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -396,12 +455,30 @@ impl ExternalSorter {
Some((self.spill_manager.create_in_progress_file("Sorting")?,
0));
}
+ // Slice only the last batch if it's too large.
Review Comment:
Why is the last batch too large? Is it possible to fix it earlier, like when
constructing `globally_sorted_batches`, make sure they're around `batch_size`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]