[GitHub] [arrow-ballista] andygrove commented on pull request #222: MINOR: Increase default shuffle.partitions to 16

GitBox Fri, 16 Sep 2022 10:08:04 -0700


andygrove commented on PR #222:
URL: https://github.com/apache/arrow-ballista/pull/222#issuecomment-1249593288


   > Might be good to provide an automated way of creating the number of 
partitions based on a heuristic on startup if the configuration is not set like 
DataFusion does? E.g. `SUM(core_count)`.
   
   This is challenging because it is the scheduler process that needs to use 
this setting, and it has no way today of knowing how many cores are in the 
cluster AFAIK. The scheduler may start up before any executors as well.
   
   Eventually, I would expect us to dynamically re-partition and coalesce 
partitions in a similar way to Spark AQE, based on the size and number of 
partitions coming out of each query stage.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-ballista] andygrove commented on pull request #222: MINOR: Increase default shuffle.partitions to 16

Reply via email to