Abhishekmishra2808 commented on issue #61070:
URL: https://github.com/apache/airflow/issues/61070#issuecomment-3818105209

   HI, @jason810496 
   ### Deep Dive: Root Cause Analysis
   After reviewing the full traceback, the `AssertionError` is a secondary 
symptom. The primary cause is a **Networking/DNS failure** within the Breeze 
environment.
   
   **Key Evidence from Traceback:**
   * `socket.gaierror: [Errno -3] Temporary failure in name resolution`
   * `urllib3.exceptions.NameResolutionError: Failed to resolve 
'breeze-otel-collector'`
   
   **What's happening:**
   The OpenTelemetry SDK is attempting to export metrics/spans to 
`http://breeze-otel-collector:4318/v1/metrics`, but the hostname cannot be 
resolved. This leads to dropped spans, which is why `task2` is missing from the 
children spans during the test assertion.
   
   Furthermore, there's a PID conflict (`api_server is already running under 
PID 124`), suggesting that the CI runner might be suffering from "leaked" 
processes from previous steps, which often interferes with Docker network 
stability.
   
   ### Proposed Fixes:
   1. **Breeze Orchestration:** Ensure the `otel-collector` service is 
explicitly listed as a dependency for integration tests and has a healthy 
status check before the tests begin.
   2. **Robustness:** Wrap the span assertion in a retry block (as previously 
suggested) to handle cases where the collector is briefly unreachable during 
startup.
   3. **CI Cleanup:** Ensure a cleaner teardown of the `api_server` to prevent 
PID conflicts that might be locking network resources.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to