berkaysynnada opened a new issue, #9941:
URL: https://github.com/apache/arrow-datafusion/issues/9941

   ### Is your feature request related to a problem or challenge?
   
   There is a TODO item in CrossJoin: 
https://github.com/apache/arrow-datafusion/blob/2f550032140d42d1ee6d8ed86f7790766fa7302e/datafusion/physical-plan/src/joins/cross_join.rs#L122
   
   Currently CrossJoin partition count is the partition count of the right 
child. We can increase parallelism here if allowed.
   
   ### Describe the solution you'd like
   
   Let's say left has M and right has N partitions, and the target partition is 
T. We can increase the parallelism by getting the left partitions count to 
floor[T/N] (assuming T is not smaller than N). If (M x N) is smaller or equal 
than T, there would be no need to coalesce left partitions also.
   
   ### Describe alternatives you've considered
   
   -
   
   ### Additional context
   
   Theoretically, for example, 1x8 partitions of joins does the same amount of 
unit work with 2x4, but in practice, 2x4 parallelism may be more preferable (I 
have no solid evidence). So, without changing the target partitions, such kind 
of parallelism adjustment also be done if it is proved that it works better.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to