yjshen commented on issue #1599: URL: https://github.com/apache/arrow-datafusion/issues/1599#issuecomment-1100042115
I would propose we do this in several steps: - [ ] Provide a classic SortMergeJoin implementation that is less memory bound itself (but move the need of memory management to the sort operator, which we already have memory controlled). - [ ] Follow-up choice 1: Consolidate HashJoin and SortMergeJoin, providing a unified JoinExec, and do adaptive execution as @alamb suggested above. - [ ] Follow-up choice 2: incorporate a cost-based join optimizer to choose the most suitable physical plan: sort-based or hash-based. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org