[GitHub] [arrow-datafusion] korowa commented on issue #1599: Memory Limited Joins (Externalized / Spill)

via GitHub Tue, 21 Mar 2023 00:49:16 -0700


korowa commented on issue #1599:
URL: 
https://github.com/apache/arrow-datafusion/issues/1599#issuecomment-1477399197


   Looks like now that we are able to fail query in case of breaching memory 
limit, it's the right time to start working on spills.
   
   Taking into account what has been written above, I guess, next step could be 
to implement spilling for MergeJoin -- if our final intention to have runtime 
HJ -> MJ conversion it would be nice to have some guarantees that MJ won't fail 
for the same reason. I believe MJ spilling logic could be pretty 
straightforward without any pitfalls -- the naive approach would be to spill 
buffered-side data in .ipc batch by batch, more complex, and, probably, more 
effective way to think about would be spilling concatenation of all batches 
that fit in memory.
   
   After that we could follow-up with what is mentioned in issue description -- 
HJ -> MJ conversion (I believe #2628 worth to be mentioned here, to unlock 
ability for more hash joins to be converted), and spilling mechanisms for other 
join implementations.
   
   If this plan is fine, I'd like to take a stab at MJ spilling.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] korowa commented on issue #1599: Memory Limited Joins (Externalized / Spill)

Reply via email to