[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1921: Add timeout to can_grow_directly when waiting for the Condvar.

GitBox Fri, 04 Mar 2022 19:12:28 -0800


yjshen commented on a change in pull request #1921:
URL: https://github.com/apache/arrow-datafusion/pull/1921#discussion_r820027341




##########
File path: datafusion/src/execution/memory_manager.rs
##########
@@ -340,7 +341,13 @@ impl MemoryManager {
             } else if current < min_per_rqt {
                 // if we cannot acquire at lease 1/2n memory, just wait for 
others
                 // to spill instead spill self frequently with limited total 
mem
-                self.cv.wait(&mut rqt_current_used);
+                let timeout = self
+                    .cv
+                    .wait_for(&mut rqt_current_used, Duration::from_secs(5));

Review comment:
       And if we block a consumer belongs to a task, we are blocking the task 
for one partition as well, no? 
   
   > Actually this is better mode in my mind for the consumer model. I actually 
plan to make similar changes to this part of datafusion.
   
   Spilling other consumers of the same task in Spark is not well defined and 
also not supported by all consumers (you can search `trigger != this` to 
check). That's why I think it brings complexity over the gains, and we 
gradually clearer the current approach: having two kinds of consumers, tracking 
consumers and requesting consumers, to clarity memory usage pattern. Besides, 
Spark also suffers in memory controlling that users always need to tune its 
memoryFraction by reserving more memory for the unamed ones. 
   
   Given external sort as an example, it's a requesting consumer as it's 
accumulating incoming records (and this is where n from 1/2n from, we only 
count for requesting consumers while spilling), and when it finished sort and 
start to output records, it release all it's memory and surrender the memory to 
a tracking consumer. That's why I am asking if a inifite waiting is witnessed 
at the first place. Since the case should be rare and a big chance to be a bug 
if so.
   
   Please refer to https://github.com/apache/arrow-datafusion/issues/587, 
discussions in https://github.com/apache/arrow-datafusion/pull/1526 and 
https://github.com/apache/arrow-datafusion/pull/1691 for more background if you 
like. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yjshen commented on a change in pull request #1921: Add timeout to can_grow_directly when waiting for the Condvar.

Reply via email to