berkaysynnada opened a new issue, #9941: URL: https://github.com/apache/arrow-datafusion/issues/9941
### Is your feature request related to a problem or challenge? There is a TODO item in CrossJoin: https://github.com/apache/arrow-datafusion/blob/2f550032140d42d1ee6d8ed86f7790766fa7302e/datafusion/physical-plan/src/joins/cross_join.rs#L122 Currently CrossJoin partition count is the partition count of the right child. We can increase parallelism here if allowed. ### Describe the solution you'd like Let's say left has M and right has N partitions, and the target partition is T. We can increase the parallelism by getting the left partitions count to floor[T/N] (assuming T is not smaller than N). If (M x N) is smaller or equal than T, there would be no need to coalesce left partitions also. ### Describe alternatives you've considered - ### Additional context Theoretically, for example, 1x8 partitions of joins does the same amount of unit work with 2x4, but in practice, 2x4 parallelism may be more preferable (I have no solid evidence). So, without changing the target partitions, such kind of parallelism adjustment also be done if it is proved that it works better. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
