ashb commented on a change in pull request #10956:
URL: https://github.com/apache/airflow/pull/10956#discussion_r489614886
##########
File path: airflow/models/dag.py
##########
@@ -1572,6 +1661,21 @@ def bulk_sync_to_db(cls, dags: Collection["DAG"],
sync_time=None, session=None):
session.add(orm_dag)
orm_dags.append(orm_dag)
+ # Get the latest dag run for each existing dag as a single query
(avoid n+1 query)
+ most_recent_dag_runs = dict(session.query(DagRun.dag_id,
func.max_(DagRun.execution_date)).filter(
+ DagRun.dag_id.in_(existing_dag_ids),
+ or_(
+ DagRun.run_type == DagRunType.BACKFILL_JOB.value,
+ DagRun.run_type == DagRunType.SCHEDULED.value,
+ ),
+ ).group_by(DagRun.dag_id).all())
+
+ num_active_runs = dict(session.query(DagRun.dag_id,
func.count('*')).filter(
+ DagRun.dag_id.in_(existing_dag_ids),
+ DagRun.state == State.RUNNING, # pylint:
disable=comparison-with-callable
+ DagRun.external_trigger.is_(False)
+ ).group_by(DagRun.dag_id).all())
Review comment:
@mik-laj This adds 2 queries to the count of "DAG.bulk_sync_db" -- do
you think this is okay, or should I try and make this in to a subquery (well,
two) that are included in the existing query on line 1642 (`orm_dags = (`)?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]