pantShrey commented on code in PR #21882:
URL: https://github.com/apache/datafusion/pull/21882#discussion_r3443797123


##########
datafusion/physical-plan/src/repartition/mod.rs:
##########
@@ -3098,9 +3098,9 @@ mod test {
         let input_partitions = vec![partition1, partition2];
 
         // Set up context with tight memory limit to force spilling
-        // Sorting needs some non-spillable memory, so 64 bytes should force 
spilling while still allowing the query to complete
+        // Sorting needs some non-spillable memory, so 608 bytes should force 
spilling while still allowing the query to complete

Review Comment:
   I  touched on this earlier in the PR (see the thread with @adriangb above), 
but happy to revisit! The test is extremely sensitive to memory thresholds ,the 
`SpillWriteAdapter` and the async read infrastructure together appear to push 
the baseline up enough that the original 64B limit becomes too tight for the 
operator to initialize, but increasing the limit caused `RepartitionExec` and 
`RepartitionMerge` to compete for the pool in a way that didn't trigger 
reliable spilling. I'll be honest that I'm not fully confident in exactly how 
much each piece contributes to the memory shift, which is part of why I 
preserved the semantic intent of the test (a spill occurs and output is 
correctly sorted) rather than trying to nail down a precise new threshold. I'd 
really appreciate your guidance on the best approach here, happy to go 
whichever direction you think is cleanest.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to