korowa commented on code in PR #13751:
URL: https://github.com/apache/datafusion/pull/13751#discussion_r1885543455
##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -90,9 +90,6 @@ struct JoinLeftData {
/// Counter of running probe-threads, potentially
/// able to update `visited_indices_bitmap`
probe_threads_counter: AtomicUsize,
- /// Memory reservation that tracks memory used by `hash_map` hash table
- /// `batch`. Cleared on drop.
- _reservation: MemoryReservation,
Review Comment:
Because all operators streams are running simultaneously, consuming each
other and requesting memory from the pool. For the query above (`select * from
a join b order by a.field`) the timeline will be roughly like following:
1) Sort stream is started and waiting for the Join results
2) Join stream is started, and in order to produce some output it collects
build side data reserving memory in the pool
3) Join stream completed with build side and produces the output batch
(build side is still alive since it's required for producing join output)
4) Sort stream receives input batch from the join, and starts accumulating +
sorting the data, making memory reservations in memory pool
5) steps 3-4 are repeated multiple times, while more and more memory being
reserved by Sort (since it needs to read and sort all the input before
producing its output)
6) Join completes processing of probe side, returns None
7) Sort completes processing of join output, finishes processing of
internally accumulated data, and gets destroyed freeing OS memory
8) Join is not required anymore, and it also dropped freeing OS memory
Previously, the memory in the pool for join build side was "freed" on step
8, so while sort operator requested for additional memory, there was
information in the pool that build side data still exists and takes N bytes,
and now, after this patch, the reservation is deleted in the end of step 2.
So, now when Sort will start requesting for the memory, DF will report that
memory pool is empty, while build side still exists in OS memory, which is a
potential source of OOM -- e.g:
- running on 5G memory machine
- having memory pool set to 4G
- having build side with the size of 3G
- having probe side with the size of 10G
At the start of Sort input processing (step 4 from above) DF will think that
there are 4G available in the pool (while actually OS will have 2G available,
due to in-memory join build side), so the sort will try to accumulate up to 4G
before spilling, and likely will be OOM killed after reading ~2G of data (when
OS memory will be exhausted).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]