anmolxlight opened a new pull request, #67400: URL: https://github.com/apache/airflow/pull/67400
## Summary The OpenLineage listener uses a `ProcessPoolExecutor` to asynchronously emit lineage events from the scheduler. When a child process in the pool terminates abruptly, Python's `concurrent.futures` marks the pool as permanently broken. After that point, every subsequent OpenLineage event fails with `BrokenProcessPool` and lineage data stops flowing indefinitely — only a scheduler restart recovers it. ## Fix `submit_callable` now catches `BrokenProcessPool`, shuts down the broken executor, creates a fresh one, and retries the submission. This makes the listener self-healing: lineage reporting recovers automatically without a scheduler restart. ### Changes - `listener.py`: catch `BrokenProcessPool` in `submit_callable`, recreate the executor, and retry - `test_listener.py`: add `test_submit_callable_recreates_executor_on_broken_pool` that verifies the broken pool is shut down, a new executor is created, and the submission is retried ## Test Plan - [x] New unit test passes - [x] All existing OpenLineage listener unit tests pass (26 passed, 35 skipped, 0 failed) Closes #67283 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
