MatrixManAtYrService opened a new issue, #25891: URL: https://github.com/apache/airflow/issues/25891
### Apache Airflow version main (development) ### What happened Imagine that you're iterating on a dag file, so the version of it that appears to airflow keeps changing. This issue has to do with airflow remembering dataset information from old versions of the dag that you're iterating on--sometimes in ways that cannot be cleared. 1. Start with two files that define many datasets along with dags/tasks that reference them, (like `before_A.py` and `before_B.py` in this gist: https://gist.github.com/MatrixManAtYrService/41ba75355633c787cd81bc8079ee3e1f which define `a1` through `a9` and `b1` through `b11` respectively). Unpause all of them and run some of them (try: `start_a4` and `start_b11`), then wait for a few subsequent runs to complete (but there's a loop, so don't wait for all of them). 2. Delete one file (`before_B.py`), and overwrite the other (overwrite `before_A.py` with the contents of `after.py` ) so that it defines fewer datasets, and such that some of these few have different producer/consumers than in the previous version, but some of them are the same. 3. Wait for a while, eventually all of the dags in the deleted file will disappear from the UI, and some of the dags from the updated file will disappear. a) the "Next Run" information now disagrees with the dependency graph (`a5` says "0 of 4 datasets updated", that's three from the prior definition: a1_a5, a3_a5, and a2_a5 and one for the current definition: a4_a5) b) the `/datasets` view shows producer/consumer counts which seem to indicate that the older files (with more dags and datasets) are still around 4. Delete all visible DAGs and wait for them to reappear. Rhe producer/consumer counts have been decremented in some cases but not in others. Here are a few cases: a) dataset a1_a3 exists in both versions, it correctly shows 1/1 b) a1_a5 existed in the old version, but not the new one, it correctly shows 0/0 (or should it just not be shown at all?) c) a3_a8 existed in the old version, but its consumer dag (a8) is no longer active but was not explicitly delted, it now incorrectly shows 0/1 d) a2_a4 existed in the old version, but not the new one, it incorrectly shows 1/0 ### What you think should happen instead - `x of y datasets updated` in the DAGs view should not disagree with the dataset dependency graph re: how many dependencies a dag has - Producing/consuming counts in the dataset view should not include information from dags which are not shown in the DAGs view ### How to reproduce see "what happened" ### Operating System mac os ### Versions of Apache Airflow Providers n/a ### Deployment Virtualenv installation ### Deployment details virtualenv install from main at 46b87def77 ### Anything else _No response_ ### Are you willing to submit PR? - [ ] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
