ashb commented on a change in pull request #7476: [AIRFLOW-6856][depends on
AIRFLOW-6857,AIRFLOW-6862] Bulk fetch paused_dag_ids
URL: https://github.com/apache/airflow/pull/7476#discussion_r382559852
##########
File path: airflow/models/dag.py
##########
@@ -1448,63 +1448,88 @@ def create_dagrun(self,
return run
+ @classmethod
@provide_session
- def sync_to_db(self, owner=None, sync_time=None, session=None):
+ def bulk_sync_to_db(cls, dags: List["DAG"], sync_time=None, session=None):
"""
- Save attributes about this DAG to the DB. Note that this method
+ Save attributes about list of DAG to the DB. Note that this method
can be called for both DAGs and SubDAGs. A SubDag is actually a
SubDagOperator.
- :param dag: the DAG object to save to the DB
- :type dag: airflow.models.DAG
+ :param dags: the DAG objects to save to the DB
+ :type dags: List[airflow.models.dag.DAG]
:param sync_time: The time that the DAG should be marked as sync'ed
:type sync_time: datetime
:return: None
"""
+ if not dags:
+ return
from airflow.models.serialized_dag import SerializedDagModel
- if owner is None:
- owner = self.owner
if sync_time is None:
sync_time = timezone.utcnow()
-
- orm_dag = session.query(
- DagModel).filter(DagModel.dag_id == self.dag_id).first()
- if not orm_dag:
- orm_dag = DagModel(dag_id=self.dag_id)
- if self.is_paused_upon_creation is not None:
- orm_dag.is_paused = self.is_paused_upon_creation
- self.log.info("Creating ORM DAG for %s", self.dag_id)
+ log.info("Sync %s DAGs", len(dags))
+ dag_by_ids = {dag.dag_id: dag for dag in dags}
+ dag_ids = set(dag_by_ids.keys())
+ orm_dags = session.query(DagModel)\
+ .options(
+ joinedload(DagModel.tags, innerjoin=False)
Review comment:
This loads all the dags for all the dags we've loaded in one query, rather
than needing one query for each dag. This is commonly called an `n+1` query
situation (which as Kamil has shown, are expensive/results in lots of extra
queries)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services