viirya commented on a change in pull request #1921:
URL: https://github.com/apache/arrow-datafusion/pull/1921#discussion_r820035032
##########
File path: datafusion/src/execution/memory_manager.rs
##########
@@ -340,7 +341,13 @@ impl MemoryManager {
} else if current < min_per_rqt {
// if we cannot acquire at lease 1/2n memory, just wait for
others
// to spill instead spill self frequently with limited total
mem
- self.cv.wait(&mut rqt_current_used);
+ let timeout = self
+ .cv
+ .wait_for(&mut rqt_current_used, Duration::from_secs(5));
Review comment:
> And if we block a consumer belongs to a task, we are blocking the task
for one partition as well, no?
You don't get my point above. There are two difference situations, one is
between tasks and one is between consumers in same task. In Spark codebase, a
task could be blocked when it cannot requires necessary memory and it will wait
infinitely for other tasks to release memory (i.e., the code you quoted). This
makes sense. One task could be blocked due to insufficient memory and it just
waits for other tasks to finish and release memory. It also doesn't make sense
to ask other tasks spilling for the task.
For the consumers in same task, Spark doesn't do blocking for a consumer
which cannot require its memory but actively asks other consumers (and also the
consumer itself but at the last one) to do spilling and release memory if
possible.
The point is not whether block the consumer but when it will be unblocked.
The code here seems to block a consumer and passively wait infinitely for other
consumers. How do we know when other consumers will release? If no consumers
release? Then the task is just stuck there. All consumers in same task have the
same goal: finish the task. They should help others finish their jobs, by
spilling and releasing memory when possible if other consumers need. Maybe
there are some concerns I don't get so far, but that's why I think the current
model looks strange to me.
I'm not arguing about the tracking consumers and requesting consumers. This
looks clever to me too. So your reply above I have no question about it. But
seems it's not related to the issue I'm talking in this PR.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]