potiuk opened a new pull request, #67882: URL: https://github.com/apache/airflow/pull/67882
## Symptom [run 26764917219, job 78891341528](https://github.com/apache/airflow/actions/runs/26764917219/job/78891341528) (Compat 3.2.2 / P3.10, Tests ARM) — **33 errors**, all `pymongo.errors.ServerSelectionTimeoutError: ... Connection refused` at the **setup of every `TestMongoHook` test**; 10907 passed. ## Root cause The mongo hook tests use a session-scoped `MongoDbContainer` (testcontainers). The container started fine early in the run (`Container started` at 16:26:51, and the `_wait_for_mongo_ready` ping-gate passed — there is no "did not answer ping" in the log), but the mongo module runs much later in this ~17-minute compat suite. By then **testcontainers' `ryuk` reaper had removed the container** — ryuk reaps spawned containers a short time after the controlling connection drops. Corroboration from the job log: - ryuk was enabled (`TESTCONTAINERS_RYUK_DISABLED` unset; `Pulling image testcontainers/ryuk:0.8.1` / `Container started`). - breeze's failure handler tried to dump the mongo container's logs, but they were **empty** — the container was already gone. The existing 3× start-retry + ping-gate cannot help once ryuk removes the container mid-suite (and a per-test connection retry wouldn't either — the container is gone for the rest of the run). ## Fix Set `TESTCONTAINERS_RYUK_DISABLED=true` in `providers/mongo/tests/conftest.py` before any `MongoDbContainer` is created — **only in CI** (`CI` / `GITHUB_ACTIONS`). The fixture already stops the container explicitly in its `finally` block and CI runners are ephemeral, so ryuk's auto-reaping is unnecessary there. Local runs keep ryuk enabled so a container left by an interrupted test run is still cleaned up. Test-infra only (`providers/mongo/tests/conftest.py`). No newsfragment (providers don't consume them). > Note: this flake only reproduces in the long compat / docker-in-docker CI run, so it can't be reproduced locally; the fix is grounded in the job-log evidence (ryuk enabled + empty dumped container logs + container-gone-mid-suite), and disabling ryuk is the documented testcontainers remedy when container lifecycle is managed explicitly. --- ##### Was generative AI tooling used to co-author this PR? - [X] Yes — Claude Code (Opus 4.8) Generated-by: Claude Code (Opus 4.8) following [the guidelines](https://github.com/apache/airflow/blob/main/contributing-docs/05_pull_requests.rst#gen-ai-assisted-contributions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
