shubhamraj-git opened a new pull request, #67455: URL: https://github.com/apache/airflow/pull/67455
The OTel integration test methods (test_export_legacy_metric_names) failed
intermittently with `Failed: Timeout >90.0s` at `scheduler_process.wait()`.
**Root cause:** The `execution_timeout(90)` was sized to match
`wait_for_dag_run(max_wait_time=90)` but ignored the fixed overheads around it:
10s startup sleep (start_scheduler)
90s dag run wait (wait_for_dag_run max)
10s post-run sleep (span_status propagation)
Xs scheduler shutdown — includes OTel atexit flush (force_flush, up
to 10s)
───
110s+ worst case > 90s limit
When the dag run took ~65–70s, only 5–10s remained for scheduler shutdown
— not enough for force_flush(). The subprocess.wait() call had no timeout,
blocking in os.waitpid() until SIGALRM fired. The 60→90 bump in #67170 was
still too small for the same reason.
**Changes:**
- Replace bare subprocess.wait() with wait(timeout=30) + TimeoutExpired /
kill() / wait() in both finally blocks.
- Raise execution_timeout from 90→160 on all three test methods.
Budget: 10 + 90 + 10 + 30 = 140s worst case + 20s CI buffer = 160s.
---
##### Was generative AI tooling used to co-author this PR?
- [X] Yes — Claude Code (claude-sonnet-4-6)
Generated-by: Claude Code (claude-sonnet-4-6) following [the
guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
