[GitHub] [arrow-datafusion] yahoNanJing edited a comment on pull request #1560: Introduce push-based task scheduling for Ballista

GitBox Tue, 18 Jan 2022 23:25:56 -0800


yahoNanJing edited a comment on pull request #1560:
URL: 
https://github.com/apache/arrow-datafusion/pull/1560#issuecomment-1016154618



   Hi @realno, thanks for your comments.
   
   As I have mentioned in the design document, there are several disadvantages 
for the poll/pull model for task assignment.
   - Because there's no master for the cluster topology, it's not so easy to do 
global optimization for the task assignment, like local shuffle, etc.
   - It's not so efficient to fetch next task from the scheduler. Like current 
implement, it has to scan all of the tasks to check whether it's able to be 
scheduled. In the design, we also have attached our benchmark results. As the 
scheduler runs with scheduling more and more tasks, the performance of the 
poll/pull model will be downgraded. However, for the push model, it can achieve 
job level task checking and the task scheduling performance will not be 
downgraded.
   - CPU waste and may need 100ms latency which is not good for interactive 
queries which may need to be finished within 1s.
   
   The poll/pull model is simple. However, as I know, it's rarely used in 
existing production environment.
   
   Besides, this PR does not remove the code of the poll/pull model. Users can 
choose which model to use for their own cases.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [arrow-datafusion] yahoNanJing edited a comment on pull request #1560: Introduce push-based task scheduling for Ballista

Reply via email to