adriangb commented on PR #17452: URL: https://github.com/apache/datafusion/pull/17452#issuecomment-3842543791
@Dandandan could you explain your intuition behind this change introducing regressions (I'm not saying it didn't, we have to benchmark to confirm)? Our intuition for this not introducing performance regressions is that all build side partitions should finish ~ at the same time since the distribution of data amongst them is random if the join key is also random. This would not be the case if e.g. there are 2 join keys, one going into each of 2 partitions, and one has a lot more rows that the other in the build side. But then I think the negative impact on overall query performance would only happen when the probe side data sizes are flipped. I.e. part1 has a small build side and large probe side, and part2 is the opposite. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
