korowa commented on issue #5220:
URL: 
https://github.com/apache/arrow-datafusion/issues/5220#issuecomment-1474225900

   Actually there is also `SymmetricHashJoinExec` -- but, I suppose, its memory 
management is a bit more complicated and may be the scope of this issue:
   - It's, sort of, isolated -- at this moment there are no options for 
planning this operator with DataFusion planner
   - I don't think that simply throwing error and aborting execution is 
acceptable for `SymmetricHashJoinExec` -- if my understanding is correct -- 
main use case for this operator is joining two unbounded sources (streaming 
jobs), and from this point of view it doesn't make much sense to limit memory 
without any spilling fallbacks (subjectively, it doesn't seem correct to fail 
data streaming job in case of memory overallocation attempt)
   
   My proposal here would be to file separate issue for `SymmetricHashJoinExec` 
memory management, and (as I see it) implement memory limitation along with 
data spilling. Maybe, we can go for it when we have reliable spilling for 
`HashJoinExec`, however, prior to it is also an option.
   
   @ozankabak, @metesynnada it would be great to hear your thoughts on it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to