Dandandan commented on issue #4139: URL: https://github.com/apache/arrow-datafusion/issues/4139#issuecomment-1306880712
Sounds like a good plan. For hash join, probably needs some benchmarking to figure out good defaults and avoid performance degradation. `CollectLeft` limits the amount of parellization on the left side: building the hash table is relatively expensive and is done (at least currently) in a single thread. In quite a few cases it might be more beneficial to do a (local) hash repartitioning which is relatively cheap. It also depends on the size of the probe/right side: if that's e.g. >100x as big as the left side it might be beneficial to avoid the hash repartitioning on the right side by switching to `CollectLeft`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
