potiuk opened a new issue, #67044:
URL: https://github.com/apache/airflow/issues/67044
### Apache Airflow version
main
### What happened?
`providers/openlineage/tests/unit/openlineage/extractors/test_base.py::test_default_extractor_uses_wrong_operatorlineage_class`
fails on the `Compat 3.0.6:P3.10` and `Compat 3.1.8:P3.10`
provider-distributions jobs, asserting on an `OperatorLineage` that contains
leaked `Dataset` entries from `providers/common/ai/.../durable/test_storage.py`:
```
FAILED
providers/openlineage/tests/unit/openlineage/extractors/test_base.py::test_default_extractor_uses_wrong_operatorlineage_class
Expected: OperatorLineage(inputs=[], outputs=[], run_facets={},
job_facets={})
Got: OperatorLineage(
inputs=[Dataset(namespace='file',
name='/tmp/pytest-of-root/pytest-0/test_save_and_load_roundtrips0/test_dag_my_task_run_1.json'),
...],
outputs=[Dataset(namespace='file',
name='/tmp/pytest-of-root/pytest-0/test_save_and_load_roundtrips0/test_dag_my_task_run_1.json'),
...],
run_facets={}, job_facets={},
)
```
### Root cause
The hook lineage collector returned by `get_hook_lineage_collector()` is a
**process-wide singleton**. The `DurableStorage` tests in
`providers/common/ai/tests/unit/common/ai/durable/test_storage.py` exercise
`ObjectStoragePath` reads/writes (via the `storage` fixture with
`dag_id=\"test_dag\", task_id=\"my_task\", run_id=\"run_1\"`). On the Compat
3.0.x/3.1.x targets the underlying SDK ships a real `HookLineageCollector` (not
the `NoOpCollector` used in current `main`), so those file operations register
as input/output assets in the singleton.
When `test_default_extractor_uses_wrong_operatorlineage_class` then calls
`ExtractorManager().extract_metadata(...)`, the extractor returns an invalid
lineage class, so `validate_task_metadata` yields `None` → empty
`OperatorLineage()`. The manager then falls through to `get_hook_lineage()`
(see
`providers/openlineage/src/airflow/providers/openlineage/extractors/manager.py:130-135`),
reads the polluted singleton, and merges those leaked datasets into the
result. The assertion fails.
This is invisible on `main` because:
- `NoOpCollector` is returned by `get_hook_lineage_collector()` when no hook
lineage reader plugin is installed
(`task-sdk/src/airflow/sdk/lineage.py:340-347`), so locally / in non-Compat CI
there's no state to leak.
- Scheduled jobs on `main` only run `Compat 2.10.5:P3.9` and `Compat
3.0.0:P3.9`. The `common.ai` provider isn't installed there (see
`PROVIDERS_COMPATIBILITY_TESTS_MATRIX` in
`dev/breeze/src/airflow_breeze/global_constants.py`).
- The `common.ai` durable storage path was added in #64199 (Add durable
execution for `AgentOperator`), making this a recent regression on the Compat
3.0.6/3.1.8 matrix entries.
### What you think should happen instead?
Either:
1. The openlineage test should isolate itself from the global collector
(done in the immediate-fix PR).
2. Tests that exercise `ObjectStoragePath` (or other operations that
register with the hook lineage collector) should reset the collector in
teardown, so cross-test pollution isn't possible.
The immediate fix is (1). (2) is the more durable fix and worth doing as a
follow-up — any other extractor test that falls through to `get_hook_lineage()`
is potentially affected by the same pattern.
### How to reproduce
Trigger CI on any branch that touches a path picked up by the Compat
3.0.6/3.1.8 jobs (selective-checks decides). The failure is deterministic when
both `common.ai` durable storage tests and `openlineage` extractor tests run in
the same pytest session.
### Operating System
Linux (CI containers)
### Versions of Apache Airflow Providers
main
### Deployment
Other
### Deployment details
Not applicable — affects CI only.
### Anything else?
Surfaced while rebasing #66825 (an unrelated FileTrigger test fix); blocking
that PR's CI.
### Are you willing to submit PR?
- [X] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
---
Drafted-by: Claude Code (Opus 4.7); reviewed by @potiuk before posting
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]