yuseok89 commented on code in PR #67242:
URL: https://github.com/apache/airflow/pull/67242#discussion_r3313074533


##########
airflow-core/src/airflow/api_fastapi/core_api/routes/ui/dags.py:
##########
@@ -308,3 +311,45 @@ def get_latest_run_info(dag_id: str, session: SessionDep) 
-> DAGRunLightResponse
     latest_run_info = session.execute(latest_run_info_select).one_or_none()
 
     return DAGRunLightResponse(**latest_run_info._mapping) if latest_run_info 
else None
+
+
+@dags_router.get(
+    "/run_state_counts",
+    dependencies=[
+        Depends(requires_access_dag(method="GET")),
+        Depends(requires_access_dag(method="GET", 
access_entity=DagAccessEntity.RUN)),
+    ],
+    operation_id="get_dag_run_state_counts_ui",
+)
+def get_dag_run_state_counts(
+    session: SessionDep,
+    readable_dags_filter: ReadableDagsFilterDep,
+    dag_ids: Annotated[list[str], Query(min_length=1)],
+    run_after_gte: datetime | None = None,
+) -> DAGsRunStateCountsCollectionResponse:
+    """Return per-Dag DagRun state counts (zero-filled) for the Dag list 
page."""
+    permitted_dag_ids = readable_dags_filter.value or set()
+    requested_dag_ids = [dag_id for dag_id in dict.fromkeys(dag_ids) if dag_id 
in permitted_dag_ids]
+    counts_by_dag: dict[str, dict[DagRunState, int]] = {
+        dag_id: {state: 0 for state in DagRunState} for dag_id in 
requested_dag_ids
+    }
+
+    if requested_dag_ids:
+        count_query = (
+            select(DagRun.dag_id, DagRun.state, func.count().label("cnt"))
+            .where(DagRun.dag_id.in_(requested_dag_ids))
+            .group_by(DagRun.dag_id, DagRun.state)
+        )
+        if run_after_gte is not None:
+            count_query = count_query.where(DagRun.run_after >= run_after_gte)
+        for row in session.execute(count_query):
+            if row.state is None:
+                continue
+            counts_by_dag[row.dag_id][DagRunState(row.state)] = row.cnt
+

Review Comment:
   This was a point I hadn't thought of. Thanks for catching it.
   
   It's taking me a bit of time to test this part directly. Getting test 
results with actual numbers will take even longer, but there are a few things 
I'd like to discuss before then, and I wanted to check with you before heading 
further in the wrong direction.
   
   You mentioned 10M, but I think 5M was already enough for a solid test.
   <img width="796" height="119" alt="image" 
src="https://github.com/user-attachments/assets/64687dc0-76d3-45d0-a4ec-f47f0819b8ae";
 />
   
   That said, a few questions came up while testing. You pointed to the Home 
page (which historical_metrics backs) as already capping these counts at 1K+, 
but **is that cap actually effective?**
   With 5M DagRuns inserted, loading the Home page itself is often pending for 
30s+, which really wasn't much different from loading the Dags page with the 
unbounded SQL (the initial approach).
   
   The historical_metrics you mentioned looks like it makes 4 calls, one per 
state. Similarly, I tried running this Dags state count as 4 queries, one per 
state (each UNION-ing up to 50 dag_ids per page), capped at 1K+, and checked 
the result.
   This too showed no noticeable difference in execution time.
   If my testing direction is wrong, I'd appreciate you letting me know.
   
   I'll try to attach the numeric test results within this week.
   
   My question here is: even when there's no meaningful difference like this, 
is giving users the limited `1K+` information the right direction? Even if 
there were a small difference, I wonder whether there's a better option than 
making most users see limited information because of a design built around a 
tiny minority of heavy users. What do you think?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to