GitHub user Asquator edited a discussion: Redesign the scheduler logic to avoid starvation due to dropped tasks in critical section
The way critical section works now is: 1. Fire a `select` query and get at most `max_tis` task instances to schedule 2. Loop over tasks to check concurrency limits and find tasks eligible to scheduling 3. If at least one task instance is found, exit and send the good tasks to executors 4. Otherwise, update `starved_` filters and try again The third step can cause any amount of tasks to be dropped due to concurrency limits (as long as there is at least one ready task found), and only few tasks will survive. At the same time, ready tasks will queue up in the table without getting the chance to run. This can cause tasks to starve for a long time in edge cases like almost full prioritized pools, as pointed out here: https://github.com/apache/airflow/issues/45636 We have to rethink the scheduler logic (the query or the loop altogether) to avoid this kind of starvation. GitHub link: https://github.com/apache/airflow/discussions/49160 ---- This is an automatically sent email for commits@airflow.apache.org. To unsubscribe, please send an email to: commits-unsubscr...@airflow.apache.org