andygrove commented on PR #222: URL: https://github.com/apache/arrow-ballista/pull/222#issuecomment-1249593288
> Might be good to provide an automated way of creating the number of partitions based on a heuristic on startup if the configuration is not set like DataFusion does? E.g. `SUM(core_count)`. This is challenging because it is the scheduler process that needs to use this setting, and it has no way today of knowing how many cores are in the cluster AFAIK. The scheduler may start up before any executors as well. Eventually, I would expect us to dynamically re-partition and coalesce partitions in a similar way to Spark AQE, based on the size and number of partitions coming out of each query stage. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
