MatrixManAtYrService opened a new issue, #26256:
URL: https://github.com/apache/airflow/issues/26256

   ### Apache Airflow version
   
   2.4.0b1
   
   ### What happened
   
   I have [this test 
dag](https://gist.github.com/MatrixManAtYrService/2cf0ebbd85faa2aac682d9c441796c58)
 which I created to report [this 
issue](https://github.com/apache/airflow/issues/25210).  The idea is that if 
you unpause "sink" and all of the "sources" then the sources will wait until 
the clock is like \*:\*:00 and they'll terminate at the same time.  
   
   Since each source triggers the sink with a dataset called "counter", the 
"sink" dag will run just once, and it will have output like:  `INFO - [(16, 
1)]`, that's 16 sources and 1 sink that ran.
   
   At this point, you can look at the dataset history for "counter" and you'll 
see this:
   
   <img width="524" alt="Screen Shot 2022-09-08 at 6 07 44 PM" 
src="https://user-images.githubusercontent.com/5834582/189248999-d31141a4-2d0b-4ec2-9ea5-c4c3536b3a28.png";>
   
   So we've got a timestamp, but the "triggered runs" count is empty.  That's 
weird.  One run was triggered (and it finished by the time the screenshot was 
taken), so why doesn't it say `1`?
   
   So I redeploy and try it again, except this time I wait several seconds 
between each "unpause" click, the idea being that maybe some of them fire at 
07:16:00 and the others fire at 07:17:00.  I end up with this:
   
   <img width="699" alt="Screen Shot 2022-09-08 at 6 19 12 PM" 
src="https://user-images.githubusercontent.com/5834582/189252116-69067189-751d-40e7-89c5-8d1da1720237.png";>
   
   So fifteen of them finished at once and caused the dataset to update, and 
then just one straggler (number 9)  is waiting for an additional minute.  I 
wait for the straggler to complete and go back to the dataset view:
   
   <img width="496" alt="Screen Shot 2022-09-08 at 6 20 41 PM" 
src="https://user-images.githubusercontent.com/5834582/189253874-87bb3eb3-2237-42a1-bc3f-9fc210419f1a.png";>
   
   Now it's the straggler that is blank, but the rest of them are populated.  
Continuing to manually run these, I find that whichever one I have run most 
recently is blank, and all of the others are 1, even if this is the second or 
third time I've run them
   
   
   
   ### What you think should happen instead
   
   - The triggered runs counter should increment beyond 1
   - It should increment immediately after the dag was triggered, not wait 
until after the *next* dag gets triggered.
   
   ### How to reproduce
   
   See dags in in this gist: 
https://gist.github.com/MatrixManAtYrService/2cf0ebbd85faa2aac682d9c441796c58
   
   1. unpause "sink"
   2. unpause half of sources
   3. wait one minute
   4. unpause the other half of the sources
   5. wait for "sink" to run a second time
   6. view the dataset history for "counter"
   7. ask why only half of them are populated
   8. manually trigger some sources, wait for them to trigger sink
   9. view the dataset history again
   10. ask why none of them show more than 1 dagrun triggered @ @
   
   ### Operating System
   
   Kubernetes in Docker, deployed via helm
   
   ### Versions of Apache Airflow Providers
   
   n/a
   
   ### Deployment
   
   Other 3rd-party Helm chart
   
   ### Deployment details
   
   see "deploy.sh" in the gist: 
   
   https://gist.github.com/MatrixManAtYrService/2cf0ebbd85faa2aac682d9c441796c58
   
   It's just a fresh install into a k8s cluster
   
   ### Anything else
   
   n/a
   
   ### Are you willing to submit PR?
   
   - [ ] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to