kaxil commented on code in PR #67252: URL: https://github.com/apache/airflow/pull/67252#discussion_r3277909699
########## providers/common/ai/tests/unit/common/ai/durable/test_storage.py: ########## @@ -35,7 +35,8 @@ def tmp_cache_path(tmp_path): @pytest.fixture -def storage(tmp_cache_path): +def storage(tmp_cache_path, hook_lineage_collector): + # ObjectStoragePath IO records hook lineage on compat Airflow versions; keep it isolated. Review Comment: The same `ObjectStoragePath` pollution pattern exists in two other places in this provider, and they'll keep flaking the openlineage extractor test on Compat 3.0/3.1 even after this fix lands: - `providers/common/ai/tests/unit/common/ai/operators/test_agent.py::TestAgentOperatorDurable::test_execute_durable_wraps_model_and_cleans_up` patches `_get_base_path` to a real `file://` path (line 469) and lets `DurableStorage` write through it, so the hook lineage collector picks up the same `Dataset` entries that issue #67044 describes. - `providers/common/ai/tests/unit/common/ai/utils/test_file_analysis.py` is heavier: most tests in `TestBuildFileAnalysisRequest`, `TestFileAnalysisHelpers`, and `TestFormatReaders` (e.g. `test_text_file_analysis`, `test_detect_file_format`, `test_render_parquet_uses_lazy_import`, ...) construct `ObjectStoragePath(str(path))` against real `tmp_path` files and read through them, which registers inputs on the singleton collector. Would you mind extending this PR to either (a) add `hook_lineage_collector` to the affected fixtures/tests there too, or (b) introduce an autouse fixture at the `providers/common/ai/tests/unit/common/ai/conftest.py` level that resets the collector for every test in this provider? Option (b) matches the "more durable fix" the issue calls out and would prevent the next `ObjectStoragePath`-using test from re-introducing the same leak. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
