[GitHub] [airflow] mis98zb commented on pull request #32122: control task group concurrency: use allow list for mapping index

via GitHub Tue, 29 Aug 2023 00:09:24 -0700


mis98zb commented on PR #32122:
URL: https://github.com/apache/airflow/pull/32122#issuecomment-1696888654


   According to our experience, current implement has a performance issue for 
the case that large scale expanding with small concurrent limitation.
   
   For example, several pipelines each has a 500+ item to expand, and the group 
concurrency limitation is 8.
   Currently the first 8 mapping index is running.
   The function `_schedule_dag_run()` will select the schedulable TIs of all 
the valid mapping index with the query limitation, which may be [8, 39].
   However, according to the group concurrency limitation, all of them are 
rejected by `_executable_task_instances_to_queued()` and are marked as starved 
to avoid `_schedule_dag_run()` select them again.
   This may waste several rounds of `_executable_task_instances_to_queued()`, 
which will impact the performance of the scheduler.
   
   I'm considering to move the filter logic of group concurrency from 
`_executable_task_instances_to_queued()` to `_schedule_dag_run()`.
   More exactly, I'm thinking about putting the filter code in 
`DagRun::task_instance_scheduling_decisions()`.
   
   How do you think about this? @uranusjr @eladkal 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [airflow] mis98zb commented on pull request #32122: control task group concurrency: use allow list for mapping index

Reply via email to