kaxil commented on code in PR #54103:
URL: https://github.com/apache/airflow/pull/54103#discussion_r2705351661
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -490,10 +515,60 @@ def _executable_task_instances_to_queued(self, max_tis:
int, session: Session) -
.where(~DM.is_paused)
.where(TI.state == TaskInstanceState.SCHEDULED)
.where(DM.bundle_name.is_not(None))
+ .join(
+ dr_task_concurrency_subquery,
+ and_(
+ TI.dag_id == dr_task_concurrency_subquery.c.dag_id,
+ TI.run_id == dr_task_concurrency_subquery.c.run_id,
+ ),
+ isouter=True,
+ )
+ .where(
+
func.coalesce(dr_task_concurrency_subquery.c.task_per_dr_count, 0) <
DM.max_active_tasks
+ )
.options(selectinload(TI.dag_model))
.order_by(-TI.priority_weight, DR.logical_date, TI.map_index)
)
+ # Create a subquery with row numbers partitioned by dag_id and
run_id.
+ # Different dags can have the same run_id but
+ # the dag_id combined with the run_id uniquely identify a run.
+ ranked_query = (
+ query.add_columns(
+ func.row_number()
+ .over(
+ partition_by=[TI.dag_id, TI.run_id],
+ order_by=[-TI.priority_weight, DR.logical_date,
TI.map_index],
+ )
+ .label("row_num"),
+ DM.max_active_tasks.label("dr_max_active_tasks"),
+ # Create columns for the order_by checks here for sqlite.
Review Comment:
The row_number ranking happens here before starvation filters
(starved_pools, etc) are applied below. In the original code, those filters
were applied BEFORE the limit.
Tasks in starved pools will consume row_number slots and then get filtered
out, potentially excluding schedulable tasks from the same dag run. Should we
apply starvation filters to the base query before building ranked_query?
##########
airflow-core/src/airflow/jobs/scheduler_job_runner.py:
##########
@@ -194,6 +207,16 @@ def _is_parent_process() -> bool:
return multiprocessing.current_process().name == "MainProcess"
Review Comment:
This queries the same data as `ConcurrencyMap.load()` which is still called
and used for the check at lines ~665-680. With SQL-level filtering now in
place, that Python check should mostly pass (barring race conditions). Worth
adding a comment explaining why we keep both?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]