[GitHub] [arrow-datafusion] Dandandan commented on issue #5999: Improve DataFusion scalability as more cores are added

via GitHub Tue, 18 Apr 2023 01:17:20 -0700


Dandandan commented on issue #5999:
URL: 
https://github.com/apache/arrow-datafusion/issues/5999#issuecomment-1512657524


   I could also replicate the issue of non-perfect scaling with loading the 
tables in memory.
   
   One thing I noticed is that the current round-robin `RepartitionExec` 
doesn't spread the bathes evenly over the number of output channels, which can 
already be seen in MemoryExec itself:
    
   `MemoryExec: partitions=32, partition_sizes=[32, 32, 32, 32, 32, 32, 32, 32, 
26, 26, 26, 25, 25, 25, 25, 25, 25, 25, 16, 16, 16, 16, 16, 16, 16, 16, 16, 16, 
16, 16, 16, 16], metrics=[]`
   
   It has a bias for the starting partitions as the outputs always go first to 
those channels.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] Dandandan commented on issue #5999: Improve DataFusion scalability as more cores are added

Reply via email to