gabotechs commented on PR #19761: URL: https://github.com/apache/datafusion/pull/19761#issuecomment-3744418572
> When this is implemented, we might want to look at hash_join_single_partition_threshold and hash_join_single_partition_threshold_rows again which could be reduced to make most joins run fully in parallel. I do expect buffering to have a positive impact even if all optimizations you mentioned are shipped. Buffering has a much greater impact in real scenarios, where the IO component is way heavier as data might be stored in a bucket or in a remote resource like an API, I was actually surprised to see that there's a non negligible impact if running benchmarks against local files. Regardless of the order of events, this PR still needs work, it should not imply slowdowns in any of the current benchmarks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
