potiuk opened a new issue, #67044:
URL: https://github.com/apache/airflow/issues/67044

   ### Apache Airflow version
   
   main
   
   ### What happened?
   
   
`providers/openlineage/tests/unit/openlineage/extractors/test_base.py::test_default_extractor_uses_wrong_operatorlineage_class`
 fails on the `Compat 3.0.6:P3.10` and `Compat 3.1.8:P3.10` 
provider-distributions jobs, asserting on an `OperatorLineage` that contains 
leaked `Dataset` entries from `providers/common/ai/.../durable/test_storage.py`:
   
   ```
   FAILED 
providers/openlineage/tests/unit/openlineage/extractors/test_base.py::test_default_extractor_uses_wrong_operatorlineage_class
     Expected: OperatorLineage(inputs=[], outputs=[], run_facets={}, 
job_facets={})
     Got:      OperatorLineage(
       inputs=[Dataset(namespace='file', 
name='/tmp/pytest-of-root/pytest-0/test_save_and_load_roundtrips0/test_dag_my_task_run_1.json'),
 ...],
       outputs=[Dataset(namespace='file', 
name='/tmp/pytest-of-root/pytest-0/test_save_and_load_roundtrips0/test_dag_my_task_run_1.json'),
 ...],
       run_facets={}, job_facets={},
     )
   ```
   
   ### Root cause
   
   The hook lineage collector returned by `get_hook_lineage_collector()` is a 
**process-wide singleton**. The `DurableStorage` tests in 
`providers/common/ai/tests/unit/common/ai/durable/test_storage.py` exercise 
`ObjectStoragePath` reads/writes (via the `storage` fixture with 
`dag_id=\"test_dag\", task_id=\"my_task\", run_id=\"run_1\"`). On the Compat 
3.0.x/3.1.x targets the underlying SDK ships a real `HookLineageCollector` (not 
the `NoOpCollector` used in current `main`), so those file operations register 
as input/output assets in the singleton.
   
   When `test_default_extractor_uses_wrong_operatorlineage_class` then calls 
`ExtractorManager().extract_metadata(...)`, the extractor returns an invalid 
lineage class, so `validate_task_metadata` yields `None` → empty 
`OperatorLineage()`. The manager then falls through to `get_hook_lineage()` 
(see 
`providers/openlineage/src/airflow/providers/openlineage/extractors/manager.py:130-135`),
 reads the polluted singleton, and merges those leaked datasets into the 
result. The assertion fails.
   
   This is invisible on `main` because:
   - `NoOpCollector` is returned by `get_hook_lineage_collector()` when no hook 
lineage reader plugin is installed 
(`task-sdk/src/airflow/sdk/lineage.py:340-347`), so locally / in non-Compat CI 
there's no state to leak.
   - Scheduled jobs on `main` only run `Compat 2.10.5:P3.9` and `Compat 
3.0.0:P3.9`. The `common.ai` provider isn't installed there (see 
`PROVIDERS_COMPATIBILITY_TESTS_MATRIX` in 
`dev/breeze/src/airflow_breeze/global_constants.py`).
   - The `common.ai` durable storage path was added in #64199 (Add durable 
execution for `AgentOperator`), making this a recent regression on the Compat 
3.0.6/3.1.8 matrix entries.
   
   ### What you think should happen instead?
   
   Either:
   1. The openlineage test should isolate itself from the global collector 
(done in the immediate-fix PR).
   2. Tests that exercise `ObjectStoragePath` (or other operations that 
register with the hook lineage collector) should reset the collector in 
teardown, so cross-test pollution isn't possible.
   
   The immediate fix is (1). (2) is the more durable fix and worth doing as a 
follow-up — any other extractor test that falls through to `get_hook_lineage()` 
is potentially affected by the same pattern.
   
   ### How to reproduce
   
   Trigger CI on any branch that touches a path picked up by the Compat 
3.0.6/3.1.8 jobs (selective-checks decides). The failure is deterministic when 
both `common.ai` durable storage tests and `openlineage` extractor tests run in 
the same pytest session.
   
   ### Operating System
   
   Linux (CI containers)
   
   ### Versions of Apache Airflow Providers
   
   main
   
   ### Deployment
   
   Other
   
   ### Deployment details
   
   Not applicable — affects CI only.
   
   ### Anything else?
   
   Surfaced while rebasing #66825 (an unrelated FileTrigger test fix); blocking 
that PR's CI.
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   
   ---
   Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to