diogosilva30 opened a new pull request, #68078: URL: https://github.com/apache/airflow/pull/68078
closes: #68077 <!-- Thank you for contributing! Please make sure that your code changes are covered with tests. And in case of new features or big changes remember to adjust the documentation. --> ## Why The API server serves the Edge Worker REST API (`/edge_worker/v1/...`). A worker heartbeat (`PATCH /edge_worker/v1/worker/<name>`) runs `set_state` → `set_metrics`, which records `edge_worker.*` metrics through the **Task SDK** `Stats` singleton (resolved by the Edge provider via `airflow.providers.common.compat`). Every other long-running component initializes that singleton — `scheduler_job_runner.py`, `triggerer_job_runner.py`, `dag_processing/manager.py`, `executors/base_executor.py`, `task-sdk/.../task_runner.py`, `task-sdk/.../serde/__init__.py` all call `stats.initialize(factory=stats_utils.get_stats_factory(), export_legacy_names=...)`. The **API server never does**. Before #63932 (*Remove the DualStatsManager and the Stats interfaces*) `Stats` lazily auto-initialized its backend on first use; #63932 replaced that with explicit `Stats.initialize(...)` + a PID guard, and added the explicit call to the components above but **not** to the API server. As a result the Task SDK `Stats` singleton in the API server process stays a `NoStatsLogger` and every Edge Worker metric is silently dropped. This also explains the asymmetry where `api_server.*` metrics still work (they use the separately-initialized **core** stats path) while Edge metrics (the **SDK** path) vanish. ## What Initialize the Task SDK `Stats` singleton from the FastAPI `lifespan` (runs once per worker, post-fork), mirroring the existing init in `serde` / `task_runner`. The call is guarded so a metrics misconfiguration can never block API server startup. ## How verified - New unit tests in `airflow-core/tests/unit/api_fastapi/test_app.py`: asserts `lifespan` initializes Task SDK `Stats` with the configured factory, and that an init failure is swallowed (startup not blocked). - Verified on a live Airflow 3.2.2 + `edge3` 3.7.0 deployment: before the fix `Stats.instance` is `NoStatsLogger` and no `edge_worker.*` series export; after the fix 200+ `edge_worker.*` series export, correctly tagged with `worker_name`. ## Relationship to #67328 Complementary, not a duplicate. #67328 makes `edge3` dual-emit the legacy dotted form so old StatsD mappings match again (a tag/naming concern). It does **not** initialize the `Stats` singleton — `DualStatsManager.gauge(...)` without `Stats.initialize()` is also dropped. This PR fixes the root-cause init gap. <!-- Please keep an empty line above the dashes. --> --- **^ Add meaningful description above** Read the **[Pull Request Guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#pull-request-guidelines)** for more information. In case of fundamental code changes, an Airflow Improvement Proposal ([AIP](https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+Improvement+Proposals)) is needed. In case of a new dependency, check compliance with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). In case of backwards incompatible changes please leave a note in a newsfragment file, named `{pr_number}.significant.rst` or `{issue_number}.significant.rst`, in [airflow-core/newsfragments](https://github.com/apache/airflow/tree/main/airflow-core/newsfragments). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
