mingmwang commented on issue #4139: URL: https://github.com/apache/arrow-datafusion/issues/4139#issuecomment-1306856449
@alamb @andygrove @isidentical @Dandandan @yahoNanJing I plan to add a new physical rule for physical join implementation selection based on stats. The basic idea is if one of side(left or right) is small enough(threshold on size or row count, for example 10M), will choose HashJoin:CollectLeft. Else if either side exceed that threshold but still smaller than hash join threshold(estimated_size/partition_count <= 128M for example), will choose HashJoin: Partitioned. Otherwise choose SortMergeJoin. Please share your thoughts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
