pierrejeambrun commented on code in PR #50984:
URL: https://github.com/apache/airflow/pull/50984#discussion_r2104534039


##########
airflow-core/src/airflow/api_fastapi/core_api/routes/public/dags.py:
##########
@@ -115,30 +116,54 @@ def get_dags(
     session: SessionDep,
 ) -> DAGCollectionResponse:
     """Get all DAGs."""
-    dag_runs_select = None
+    query = select(DagModel)
 
-    if dag_run_state.value or dag_run_start_date_range.is_active() or 
dag_run_end_date_range.is_active():
-        dag_runs_select, _ = paginated_select(
-            statement=select(DagRun),
+    max_run_id_query = (  # ordering by id will not always be "latest run", 
but it's a simplifying assumption
+        select(DagRun.dag_id, func.max(DagRun.id).label("max_dag_run_id"))
+        .where(DagRun.start_date.is_not(null()))
+        .group_by(DagRun.dag_id)
+        .subquery(name="mrq")
+    )
+
+    has_max_run_filter = (
+        dag_run_state.value
+        or last_dag_run_state.value
+        or dag_run_start_date_range.is_active()
+        or dag_run_end_date_range.is_active()
+    )
+
+    if has_max_run_filter or order_by.value in (
+        "last_run_state",
+        "last_run_start_date",
+        "-last_run_state",
+        "-last_run_start_date",
+    ):
+        query = query.join(
+            max_run_id_query,
+            DagModel.dag_id == max_run_id_query.c.dag_id,
+            isouter=True,
+        ).join(DagRun, DagRun.id == max_run_id_query.c.max_dag_run_id, 
isouter=True)

Review Comment:
   That's a nice optimization, but I think it's really minor, at a cost of 
maintainability. This hardcode a bunch of filters, and I'm sure in a bit when 
we edit this endpoint a few times, we will end up with filters missing in here.
   
   But that's not a big deal, I'm ok merging this.



##########
airflow-core/tests/unit/api_fastapi/core_api/routes/public/test_dags.py:
##########
@@ -220,10 +220,10 @@ class TestGetDags(TestDagEndpoint):
                     "dag_run_state": "failed",
                     "exclude_stale": False,
                 },
-                1,
-                [DAG3_ID],
+                0,
+                [],

Review Comment:
   +1 



##########
airflow-core/src/airflow/api_fastapi/core_api/routes/public/dags.py:
##########
@@ -115,30 +116,54 @@ def get_dags(
     session: SessionDep,
 ) -> DAGCollectionResponse:
     """Get all DAGs."""
-    dag_runs_select = None
+    query = select(DagModel)
 
-    if dag_run_state.value or dag_run_start_date_range.is_active() or 
dag_run_end_date_range.is_active():
-        dag_runs_select, _ = paginated_select(
-            statement=select(DagRun),
+    max_run_id_query = (  # ordering by id will not always be "latest run", 
but it's a simplifying assumption
+        select(DagRun.dag_id, func.max(DagRun.id).label("max_dag_run_id"))

Review Comment:
   I would love to avoid doing this, we can add an extra column on an 
aggregated `DagRun.id`, for instance keep it on `func.max(DagRun.start_date)` 
and add `func.max(DagRun.id)`. This way when we join, we do that both on the 
`start_date` + `DagRun.id` + `Dag.id`. This will ensure only 1 - 1 row matching.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to