andygrove opened a new issue #64: URL: https://github.com/apache/arrow-datafusion/issues/64
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.** The Ballista scheduler breaks a query down into stages based on changes in partitioning in the plan, where each stage is broken down into tasks that can be executed concurrently. Rather than trying to run all the partitions at once, Ballista executors process n concurrent tasks at a time and then request new tasks from the scheduler. This approach would help DataFusion scale better and it would be ideal to use the same scheduler to scale across cores in DataFusion and across nodes in Ballista. **Describe the solution you'd like** Implement an extensible scheduler in DataFusion and have Ballista extend it to provide distributed execution. **Describe alternatives you've considered** None **Additional context** None -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
