diogosilva30 commented on issue #68077:
URL: https://github.com/apache/airflow/issues/68077#issuecomment-4635221726

   ## UPDATE
   
   **This issue is not reproducible on the `main` branch. Here's why:**
   
   I reproduced this end-to-end on a real stack (`breeze start-airflow 
--executor EdgeExecutor --integration statsd` — real API server + real `airflow 
edge worker` + statsd-exporter) and ran an A/B test with the proposed 
`lifespan` init toggled on and off. With the init **disabled**, `edge_worker_*` 
series were still present on `main` and `edge_worker.heartbeat_count` kept 
incrementing. So the bug as written reproduces on **3.2.x**, but **not** on 
current `main`.
   
   ### Root cause of the difference
   
   `serde/__init__.py` calls `Stats.initialize(...)` as an **import-time side 
effect** (via `_register()`). On both versions the relevant import chain is the 
same:
   
   ```
   airflow/__init__.py   settings.initialize()
     → configuration.py   mask_secrets() → mask_secret_sdk()
     → sdk/log.py         mask_secret() → from airflow.sdk.execution_time 
import task_runner
   ```
   
   The difference is entirely in **how `task_runner.py` imports `serde`**:
   
   - **`main`** — `task_runner` has a **module-level** import of `serde`:
     
[`task_runner.py#L140`](https://github.com/apache/airflow/blob/main/task-sdk/src/airflow/sdk/execution_time/task_runner.py#L140)
     ```python
     from airflow.sdk.serde import allow_class, iter_pydantic_models
     ```
     So importing `task_runner` imports `serde`, whose `_register()` runs 
`Stats.initialize(...)`. That fires on **any `import airflow`**, in **every** 
process — including the API server. Verified: a bare `import 
airflow.api_fastapi.app` leaves the SDK `Stats` `_factory` set to a real 
backend, not `NoStatsLogger`.
   
   - **3.2.x** — `task_runner` imports `serde` **only lazily, inside 
functions** (`serialize`/`deserialize`), and `Stats.initialize` is likewise 
only invoked inside a function (which runs in the task subprocess, not the API 
server). So `import airflow` in the API server **never** triggers 
`serde._register()`, the SDK `Stats` singleton stays `NoStatsLogger`, and Edge 
metrics are silently dropped — exactly the reported symptom.
   
   So on `main` the API server's SDK `Stats` ends up initialized **only 
incidentally**, because it happens to inherit a `Stats.initialize` from 
`serde`'s import side effect via an unrelated mask-secrets → `task_runner` → 
`serde` chain.
   
   ### Proposal: frame the fix as hardening, not a bug fix
   
   Rather than relying on that implicit coupling, I'd suggest framing the 
`lifespan` init as **explicit hardening**:
   
   - The API server does not actually *want* `serde` at startup — the `Stats` 
initialization it currently gets is an **accidental side effect** of an 
unrelated import chain. That's fragile: it would silently regress again if 
`task_runner`'s module-level `serde` import were ever made lazy, or if the 
mask-secrets import path changed.
   - Adding an explicit `Stats.initialize(...)` in the API server's FastAPI 
`lifespan` makes the dependency **intentional and self-contained**. It's a 
**no-op on `main`** (already initialized) and a **real fix on 3.2.x**, while 
removing reliance on an import side effect that no one is guaranteeing.
   
   So I'd propose keeping the PR, but reframing it from "API server never 
initializes Stats (bug)" to "**explicitly initialize Task SDK Stats in the API 
server lifespan instead of depending on an import side effect (hardening)**", 
plus optionally a 3.2.x backport where the bug actually bites.
   
   Happy to adjust the PR wording/scope to match whichever framing maintainers 
prefer.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to