yahoNanJing commented on PR #59:
URL: https://github.com/apache/arrow-ballista/pull/59#issuecomment-1153417260

   Thanks @thinkharderdev for the great work. The code is well implemented and 
well documented. And the task scheduling algorithm is also very sophisticated.
   
   I only have two small concerns:
   1. For the stage-based task scheduling, when a stage finishes, it's better 
to schedule the tasks in its following stages at all once. However, for the 
algorithm based on the *ExecutorReservation*, it seems this kind of all-at-once 
scheduling only happens when the job submitted. Maybe better to keep the 
previous event of *StageFinished*.
   2. For every state change for a job, like task status update, this 
*ExecutionGraph*-based implement needs to fetch and decode the *ExecutionGraph* 
from the config backend. Will it be too heavy, especially when there're 
thousands or millions of tasks for a job? Actually the previous design of 
keeping the tasks in memory aims to reduce such kind of cost. Therefore, I 
prefer not to persist the task status info into the config backend. 
   3. For the job scheduling policy, this implementation makes it possible for 
one job to be scheduled by multiple schedulers. However, I think it's not 
necessary. It's better to employ an active-standby policy. And make the 
recovery level to be stage level rather than task level if the job's active 
scheduler terminated. Then we can avoid the *ExecutionGraph* decoding cost for 
every task update.
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to