2010YOUY01 commented on code in PR #17163:
URL: https://github.com/apache/datafusion/pull/17163#discussion_r2286915308


##########
datafusion/physical-plan/src/sorts/streaming_merge.rs:
##########
@@ -88,6 +88,10 @@ pub struct StreamingMergeBuilder<'a> {
     fetch: Option<usize>,
     reservation: Option<MemoryReservation>,
     enable_round_robin_tie_breaker: bool,
+    /// Ratio of memory used by cursor batch to the original input RecordBatch.
+    /// Used in `get_reserved_byte_for_record_batch_size` to estimate required 
memory for merge phase.
+    /// Only passed when constructing MultiLevelMergeBuilder

Review Comment:
   ```suggestion
       /// Only passed when constructing MultiLevelMergeBuilder
       ///
       /// A cursor is an interface for comparing entries between two batches.  
       /// It includes the original batch and the sort keys converted to Arrow 
Row format  
       /// for faster comparison. See `cursor.rs` for more details.
   ```



##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -396,12 +455,30 @@ impl ExternalSorter {
                 Some((self.spill_manager.create_in_progress_file("Sorting")?, 
0));
         }
 
+        // Slice only the last batch if it's too large.

Review Comment:
   Why is the last batch too large? Is it possible to fix it earlier, like when 
constructing `globally_sorted_batches`, make sure they're around `batch_size`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to