Abhishekmishra2808 commented on issue #61070: URL: https://github.com/apache/airflow/issues/61070#issuecomment-3818105209
HI, @jason810496 ### Deep Dive: Root Cause Analysis After reviewing the full traceback, the `AssertionError` is a secondary symptom. The primary cause is a **Networking/DNS failure** within the Breeze environment. **Key Evidence from Traceback:** * `socket.gaierror: [Errno -3] Temporary failure in name resolution` * `urllib3.exceptions.NameResolutionError: Failed to resolve 'breeze-otel-collector'` **What's happening:** The OpenTelemetry SDK is attempting to export metrics/spans to `http://breeze-otel-collector:4318/v1/metrics`, but the hostname cannot be resolved. This leads to dropped spans, which is why `task2` is missing from the children spans during the test assertion. Furthermore, there's a PID conflict (`api_server is already running under PID 124`), suggesting that the CI runner might be suffering from "leaked" processes from previous steps, which often interferes with Docker network stability. ### Proposed Fixes: 1. **Breeze Orchestration:** Ensure the `otel-collector` service is explicitly listed as a dependency for integration tests and has a healthy status check before the tests begin. 2. **Robustness:** Wrap the span assertion in a retry block (as previously suggested) to handle cases where the collector is briefly unreachable during startup. 3. **CI Cleanup:** Ensure a cleaner teardown of the `api_server` to prevent PID conflicts that might be locking network resources. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
