Kontinuation commented on code in PR #14644:
URL: https://github.com/apache/datafusion/pull/14644#discussion_r1954496042


##########
datafusion/physical-plan/src/sorts/stream.rs:
##########
@@ -171,20 +173,32 @@ impl<T: CursorArray> std::fmt::Debug for 
FieldCursorStream<T> {
 }
 
 impl<T: CursorArray> FieldCursorStream<T> {
-    pub fn new(sort: PhysicalSortExpr, streams: 
Vec<SendableRecordBatchStream>) -> Self {
+    pub fn new(
+        sort: PhysicalSortExpr,
+        streams: Vec<SendableRecordBatchStream>,
+        reservation: MemoryReservation,
+    ) -> Self {
         let streams = streams.into_iter().map(|s| s.fuse()).collect();
         Self {
             sort,
             streams: FusedStreams(streams),
+            reservation,
             phantom: Default::default(),
         }
     }
 
     fn convert_batch(&mut self, batch: &RecordBatch) -> 
Result<ArrayValues<T::Values>> {
         let value = self.sort.expr.evaluate(batch)?;
         let array = value.into_array(batch.num_rows())?;
+        let size_in_mem = array.get_buffer_memory_size();
         let array = array.as_any().downcast_ref::<T>().expect("field values");
-        Ok(ArrayValues::new(self.sort.options, array))
+        let mut array_reservation = self.reservation.new_empty();
+        array_reservation.try_grow(size_in_mem)?;
+        Ok(ArrayValues::new(
+            self.sort.options,
+            array,
+            array_reservation,
+        ))

Review Comment:
   I think this reservation is needed.
   
   When the sort expression is a simple column reference, the `array` simply 
reuses the buffer in `batch`, this is the case where reservation is not needed. 
However, the sort expression can be a complex expression such as `l_linenumber 
+ 1`, the result of evaluation takes additional space in this case. Always 
reserving `array` is a more conservative approach that prevents allocations 
from overshooting the limit.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to