andygrove opened a new issue #64:
URL: https://github.com/apache/arrow-datafusion/issues/64


   **Is your feature request related to a problem or challenge? Please describe 
what you are trying to do.**
   
   The Ballista scheduler breaks a query down into stages based on changes in 
partitioning in the plan, where each stage is broken down into tasks that can 
be executed concurrently.
   
   Rather than trying to run all the partitions at once, Ballista executors 
process n concurrent tasks at a time and then request new tasks from the 
scheduler.
   
   This approach would help DataFusion scale better and it would be ideal to 
use the same scheduler to scale across cores in DataFusion and across nodes in 
Ballista.
   
   **Describe the solution you'd like**
   
   Implement an extensible scheduler in DataFusion and have Ballista extend it 
to provide distributed execution.
   
   **Describe alternatives you've considered**
   None
   
   **Additional context**
   None
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to