korowa commented on issue #5220: URL: https://github.com/apache/arrow-datafusion/issues/5220#issuecomment-1474225900
Actually there is also `SymmetricHashJoinExec` -- but, I suppose, its memory management is a bit more complicated and may be the scope of this issue: - It's, sort of, isolated -- at this moment there are no options for planning this operator with DataFusion planner - I don't think that simply throwing error and aborting execution is acceptable for `SymmetricHashJoinExec` -- if my understanding is correct -- main use case for this operator is joining two unbounded sources (streaming jobs), and from this point of view it doesn't make much sense to limit memory without any spilling fallbacks (subjectively, it doesn't seem correct to fail data streaming job in case of memory overallocation attempt) My proposal here would be to file separate issue for `SymmetricHashJoinExec` memory management, and (as I see it) implement memory limitation along with data spilling. Maybe, we can go for it when we have reliable spilling for `HashJoinExec`, however, prior to it is also an option. @ozankabak, @metesynnada it would be great to hear your thoughts on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
