ding-young commented on code in PR #17163:
URL: https://github.com/apache/datafusion/pull/17163#discussion_r2300879395


##########
datafusion/physical-plan/src/sorts/sort.rs:
##########
@@ -782,20 +843,62 @@ impl ExternalSorter {
         &mut self,
         input: &RecordBatch,
     ) -> Result<()> {
-        let size = get_reserved_byte_for_record_batch(input);
+        let batch_size = get_reserved_byte_for_record_batch(input, None);
+        let sort_res = self.reservation.try_grow(batch_size);
+
+        // if cursor is smaller than half of original batch, we may say that 
2x batch is enough for both sort and merge phase
+        let cursor_small = self

Review Comment:
   The round up here is still a heuristic. It's hard to provide a strict 
guarantee that `StreamingMerge` won’t fail, even if we use the estimated ratio 
to increase `merge_reservation`. So for now, I kept the conservative approach 
using the original 2x ratio when the estimated ratio is < 1.0.It’s a temporary 
workaround and might be revised later.
   
   I still need to clean up some of the comments and test code, so it’d be 
great if you could take a look on the core logic for now.
   
   Side note: this PR does fix the sort-tpch Q5 issue mentioned above, but 
based on my recent investigation into the `StringViewArray` memory usage 
problem, this PR doesn’t seem to resolve the failures in other queries (Q3, 
Q7-11 that includes `VARCHAR` col) if I make the correct memory estimation on 
`StringViewArray`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to