korowa commented on code in PR #13751: URL: https://github.com/apache/datafusion/pull/13751#discussion_r1885543455
########## datafusion/physical-plan/src/joins/hash_join.rs: ########## @@ -90,9 +90,6 @@ struct JoinLeftData { /// Counter of running probe-threads, potentially /// able to update `visited_indices_bitmap` probe_threads_counter: AtomicUsize, - /// Memory reservation that tracks memory used by `hash_map` hash table - /// `batch`. Cleared on drop. - _reservation: MemoryReservation, Review Comment: Because all operators streams are running simultaneously, consuming each other and requesting memory from the pool. For the query above (`select * from a join b order by a.field`) the timeline will be roughly like following: 1) Sort stream is started and waiting for the Join results 2) Join stream is started, and in order to produce some output it collects build side data reserving memory in the pool 3) Join stream completed with build side and produces the output batch (build side is still alive since it's required for producing join output) 4) Sort stream receives input batch from the join, and starts accumulating + sorting the data, making memory reservations in memory pool 5) steps 3-4 are repeated multiple times, while more and more memory being reserved by Sort (since it needs to read and sort all the input before producing its output) 6) Join completes processing of probe side, returns None 7) Sort completes processing of join output, finishes processing of internally accumulated data, and gets destroyed freeing OS memory 8) Join is not required anymore, and it also dropped OS memory Previously, the memory in the pool for join build side was "freed" on step 8, so while sort operator requested for additional memory, there was information in the pool that build side data still exists and takes N bytes, and now, after this patch, the reservation is deleted in the end of step 2. So, now when Sort will start requesting for the memory, DF will report that memory pool is empty, while build side still exists in OS memory, which is a potential source of OOM -- e.g: - running on 5G memory machine - having memory pool set to 4G - having build side with the size of 3G - having probe side with the size of 10G At the start of Sort input processing (step 4 from above) DF will think that there are 4G available in the pool (while actually OS will have 2G available, due to in-memory join build side), so the sort will try to accumulate up to 4G before spilling, and likely will be OOM killed after reading ~2G of data (when OS memory will be exhausted). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org