gingeekrishna commented on PR #67572: URL: https://github.com/apache/airflow/pull/67572#issuecomment-4643793756
Thanks for the thorough flag, @1fanwang — and for being upfront that the issue framing was ambiguous. Re-reading #66796, I now see I misread it: the issue describes `get_count()` and the `serialized_dag.count` emit as if they already existed and needed a one-line fix. Since they don't exist in `main`, I effectively added them from scratch — that's a scope mismatch from what was intended. On the core question of whether the `serialized_dag.count` gauge is worth having: I do think there's operational value in being able to monitor how many serialized DAGs are in the DB (it's a useful signal for debugging serialization issues, DAG disappearance alerts, etc.). But I understand the concern about a `COUNT(*)` on every parse loop — the performance trade-off is real, especially at scale. Happy to take any of these directions based on your guidance: 1. **Keep the PR** — address the performance concern by adding throttling (emit once every N loops), which would make the DB cost negligible 2. **Narrow the scope** — if the issue is closed as "no change needed" (since `get_count()` doesn't exist, there's nothing to fix), I can close this PR too 3. **Split it** — keep `SerializedDagModel.get_count()` as a utility method but leave the metric emission for a separate RFC/discussion Also, FYI — the merge conflicts have been resolved and the branch is now up to date with `main`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
