gingeekrishna commented on PR #67572:
URL: https://github.com/apache/airflow/pull/67572#issuecomment-4643793756

   Thanks for the thorough flag, @1fanwang — and for being upfront that the 
issue framing was ambiguous.
   
   Re-reading #66796, I now see I misread it: the issue describes `get_count()` 
and the `serialized_dag.count` emit as if they already existed and needed a 
one-line fix. Since they don't exist in `main`, I effectively added them from 
scratch — that's a scope mismatch from what was intended.
   
   On the core question of whether the `serialized_dag.count` gauge is worth 
having: I do think there's operational value in being able to monitor how many 
serialized DAGs are in the DB (it's a useful signal for debugging serialization 
issues, DAG disappearance alerts, etc.). But I understand the concern about a 
`COUNT(*)` on every parse loop — the performance trade-off is real, especially 
at scale.
   
   Happy to take any of these directions based on your guidance:
   1. **Keep the PR** — address the performance concern by adding throttling 
(emit once every N loops), which would make the DB cost negligible
   2. **Narrow the scope** — if the issue is closed as "no change needed" 
(since `get_count()` doesn't exist, there's nothing to fix), I can close this 
PR too
   3. **Split it** — keep `SerializedDagModel.get_count()` as a utility method 
but leave the metric emission for a separate RFC/discussion
   
   Also, FYI — the merge conflicts have been resolved and the branch is now up 
to date with `main`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to