uranusjr commented on a change in pull request #18897:
URL: https://github.com/apache/airflow/pull/18897#discussion_r731735259



##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -839,30 +839,19 @@ def _create_dag_runs(self, dag_models: 
Collection[DagModel], session: Session) -
         existing_dagruns = (
             session.query(DagRun.dag_id, 
DagRun.execution_date).filter(existing_dagruns_filter).all()
         )
-        max_queued_dagruns = conf.getint('core', 'max_queued_runs_per_dag')
 
-        queued_runs_of_dags = defaultdict(
+        active_runs_of_dags = defaultdict(
             int,
-            session.query(DagRun.dag_id, func.count('*'))
-            .filter(  # We use `list` here because SQLA doesn't accept a set
-                # We use set to avoid duplicate dag_ids
-                DagRun.dag_id.in_(list({dm.dag_id for dm in dag_models})),
-                DagRun.state == State.QUEUED,
-            )
-            .group_by(DagRun.dag_id)
-            .all(),
+            DagRun.active_runs_of_dags(dag_ids=[dm.dag_id for dm in 
dag_models], session=session),

Review comment:
       This no longer deduplicates dag_id. (I think the deduplication probably 
should be a part of `DagRun.active_runs_of_dags` but right now it doesn't do 
that either.)

##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -885,12 +874,28 @@ def _create_dag_runs(self, dag_models: 
Collection[DagModel], session: Session) -
                     dag_hash=dag_hash,
                     creating_job_id=self.id,
                 )
-                queued_runs_of_dags[dag_model.dag_id] += 1
-            dag_model.calculate_dagrun_date_fields(dag, data_interval)
-
+                active_runs_of_dags[dag.dag_id] += 1
+            self._update_dag_next_dagruns(dag, dag_model, 
active_runs_of_dags[dag.dag_id])

Review comment:
       If we change this to something like
   
   ```python
   for dag_model in dag_models:
       ...
       active_run_count = active_runs_of_dags[dag.dag_id]
       if (dag.dag_id, dag_model.next_dagrun) not in existing_dagruns:
           ...
           active_run_count += 1
       self._update_dag_next_dagruns(dag, dag_model, active_run_count)
   ```
   
   We can get rid of the `defaultdict`, I think. Same goes for the other usage 
beflow.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to