pierrejeambrun commented on code in PR #46626:
URL: https://github.com/apache/airflow/pull/46626#discussion_r1951246708


##########
airflow/models/dagrun.py:
##########
@@ -300,6 +297,21 @@ def validate_run_id(self, key: str, run_id: str) -> str | 
None:
             f"The run_id provided '{run_id}' does not match regex pattern 
'{regex}' or '{RUN_ID_REGEX}'"
         )
 
+    @provide_session
+    def dag_versions(self, session: Session = NEW_SESSION) -> list[DagVersion]:
+        """
+        Return the DAG versions associated with the TIs of this DagRun.
+
+        :param session: database session
+        """
+        tis = session.scalars(
+            select(TI.dag_version_id).where(TI.run_id == self.run_id, 
TI.dag_id == self.dag_id).distinct()
+        ).all()
+        tih = session.scalars(
+            select(TIH.dag_version_id).where(TIH.run_id == self.run_id, 
TIH.dag_id == self.dag_id).distinct()
+        ).all()
+        return list(tis + tih)

Review Comment:
   The only issue I see with that is:
   
   A query will be emitted each time we do:
   ```
   dag_run.dag_versions()
   ```
   
   When we will serialize multiple dag_runs (100 of them), this will happen in 
a loop, resulting in an extra 100 query being executed to fetch dag_versions 
for each dag_run.
   
   What would be great is the ability to query dag_run and eagerly load that: 
(pseudo code)
   ```
   query(DagRun).options(joinedload(DagRun.dag_versions))
   ```
   
   This way in the webserver we can eagerly load all the `dag_versions` of the 
dag_run query all at the same time. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to