MatrixManAtYrService opened a new issue, #25891:
URL: https://github.com/apache/airflow/issues/25891

   ### Apache Airflow version
   
   main (development)
   
   ### What happened
   
   Imagine that you're iterating on a dag file, so the version of it that 
appears to airflow keeps changing.  This issue has to do with airflow 
remembering dataset information from old versions of the dag that you're 
iterating on--sometimes in ways that cannot be cleared.
   
   1. Start with two files that define many datasets along with dags/tasks that 
reference them, (like `before_A.py` and `before_B.py` in this gist: 
https://gist.github.com/MatrixManAtYrService/41ba75355633c787cd81bc8079ee3e1f 
which define `a1` through `a9` and `b1` through `b11` respectively).  Unpause 
all of them and run some of them (try: `start_a4` and `start_b11`), then wait 
for a few subsequent runs to complete (but there's a loop, so don't wait for 
all of them).
   2. Delete one file (`before_B.py`), and overwrite the other (overwrite 
`before_A.py` with the contents of `after.py` ) so that it defines fewer 
datasets, and such that some of these few have different producer/consumers 
than in the previous version, but some of them are the same.
   3. Wait for a while, eventually all of the dags in the deleted file will 
disappear from the UI, and some of the dags from the updated file will 
disappear.
   
   a) the "Next Run" information now disagrees with the dependency graph (`a5` 
says "0 of 4 datasets updated", that's three from the prior definition: a1_a5, 
a3_a5, and a2_a5 and one for the current definition: a4_a5)
   b) the `/datasets` view shows producer/consumer counts which seem to 
indicate that the older files (with more dags and datasets) are still around
   
   4. Delete all visible DAGs and wait for them to reappear.  Rhe 
producer/consumer counts have been decremented in some cases but not in others. 
 Here are a few cases:
   
   a) dataset a1_a3 exists in both versions, it correctly shows 1/1
   b) a1_a5 existed in the old version, but not the new one, it correctly shows 
0/0 (or should it just not be shown at all?)
   c) a3_a8 existed in the old version, but its consumer dag (a8) is no longer 
active but was not explicitly delted, it now incorrectly shows 0/1
   d) a2_a4 existed in the old version, but not the new one, it incorrectly 
shows 1/0
   
   ### What you think should happen instead
   
   - `x of y datasets updated` in the DAGs view should not disagree with the 
dataset dependency graph re: how many dependencies a dag has
   - Producing/consuming counts in the dataset view should not include 
information from dags which are not shown in the DAGs view
   
   
   ### How to reproduce
   
   see "what happened"
   
   ### Operating System
   
   mac os
   
   ### Versions of Apache Airflow Providers
   
   n/a
   
   ### Deployment
   
   Virtualenv installation
   
   ### Deployment details
   
   virtualenv install from main at 46b87def77
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to