[GitHub] [arrow-datafusion] viirya commented on a change in pull request #1921: Add timeout to can_grow_directly when waiting for the Condvar.

GitBox Fri, 04 Mar 2022 19:57:05 -0800


viirya commented on a change in pull request #1921:
URL: https://github.com/apache/arrow-datafusion/pull/1921#discussion_r820035032




##########
File path: datafusion/src/execution/memory_manager.rs
##########
@@ -340,7 +341,13 @@ impl MemoryManager {
             } else if current < min_per_rqt {
                 // if we cannot acquire at lease 1/2n memory, just wait for 
others
                 // to spill instead spill self frequently with limited total 
mem
-                self.cv.wait(&mut rqt_current_used);
+                let timeout = self
+                    .cv
+                    .wait_for(&mut rqt_current_used, Duration::from_secs(5));

Review comment:
       > And if we block a consumer belongs to a task, we are blocking the task 
for one partition as well, no?
   
   You don't get my point above. There are two difference situations, one is 
between tasks and one is between consumers in same task. In Spark codebase, a 
task could be blocked when it cannot requires necessary memory and it will wait 
infinitely for other tasks to release memory (i.e., the code you quoted). This 
makes sense. One task could be blocked due to insufficient memory and it just 
waits for other tasks to finish and release memory. It also doesn't make sense 
to ask other tasks spilling for the task.
   
   For the consumers in same task, Spark doesn't do blocking for a consumer 
which cannot require its memory but actively asks other consumers (and also the 
consumer itself but at the last one) to do spilling and release memory if 
possible.
   
   The point is not whether block the consumer but when it will be unblocked. 
The code here seems to block a consumer and passively wait infinitely for other 
consumers. How do we know when other consumers will release? If no consumers 
release? Then the task is just stuck there. All consumers in same task have the 
same goal: finish the task. They should help others finish their jobs, by 
spilling and releasing memory when possible if other consumers need.  Maybe 
there are some concerns I don't get so far, but that's why I think the current 
model looks strange to me.
   
   I'm not arguing about the tracking consumers and requesting consumers. This 
looks clever to me too. So your reply above I have no question about it. But 
seems it's not related to the issue I'm talking in this PR.
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] viirya commented on a change in pull request #1921: Add timeout to can_grow_directly when waiting for the Condvar.

Reply via email to