shai-ikko opened a new issue, #40841: URL: https://github.com/apache/airflow/issues/40841
### Apache Airflow version main (development) ### If "Other Airflow 2 version" selected, which one? _No response_ ### What happened? Airflow has two `start_date` fields on separate structures - the Task, and the Task Instance. The code that interprets the `wait_for_downstream` flag is part of `prev_dagrun_dep.py`; before it checks for the presence of unfinished downstream tasks, though, it looks at the task `start_date`, comparing it to the execution date of the last DAG run -- which seems perfectly logical. https://github.com/apache/airflow/blob/63662044583031fc27d98af02f2913d324245db0/airflow/ti_deps/deps/prev_dagrun_dep.py#L159-L163 However, I'm seeing the task start_date on a sensor updated with each new DAG run. As a result, it is never less than the execution date of the last DAG run -- Airflow always thinks all instances are "the first instance of its task". I encountered this using `S3KeySensor` for a sensor, but I've checked its code, it doesn't touch the `start_date`. Nor does any other task in my DAG, AFAICT. I've been able to reproduce the issue in a DAG I can share, that doesn't require access to a S3 service. The problem is also reflected through the web UI -- when I open the details of a task instance after some future runs have occurred, and click "more details", I see an instance start-date that is older than the task start-date; because the task start-date is updated later. I included more detailed investigations in the discussion, https://github.com/apache/airflow/discussions/40451 ### What you think should happen instead? As a result of the wrong task start_date, `wait_for_downstream` doesn't work. I have a dag where the sensor is marked `wait_for_downstream=True`, looking for a file with a given suffix to show up. The task immediately downstream from the sensor renames such files, giving them a different suffix. Because of the failure, the next instance of the sensor often finds the file before it was renamed, but of course, only one of the renaming task instances can succeed; the other one fails, which triggers all sorts of handling. ### How to reproduce Take the DAG from https://gist.github.com/shai-ikko/45fc6ae32556fbed519a0b2a3007d8a2 -- it has some more downstream processing after the problem is triggered, but I think it helps to clarify the issue. Using Breeze, put this DAG in `files/dags/`, and create a directory `files/tmp/`. Start the DAG. To trigger runs, create files with suffix `.incoming` in the tmp directory; e.g, from out of the docker, ```console $ touch files/tmp/use-zx81.incoming ``` Now look at the DAG on the web interface. Following the touch, I see the sensor (`detect_incoming`) succeed twice, sometimes even three times, but the next task (`send_to_processing`) can only succeed once. Also, click one of the earlier TI runs, and check its details:  then click "more details", and compare the `start_date` in the Task Instance Attributes, to the `start_date` in the Task Attributes; you'll find that the former is earlier than the latter, and you may find the latter matching the start date of the last DAG run at the time you're looking at it. ### Operating System Ubuntu 20.04.6 LTS ### Versions of Apache Airflow Providers No relevant providers ### Deployment Other Docker-based deployment ### Deployment details ```console $ breeze down Good version of Docker: 27.0.3. Good version of docker-compose: 2.28.1 ``` ### Anything else? This happens to me every time. It may be relevant that my native system time is UTC+3 -- in some of the logs I've seen times reported to match UTC, and in some logs the times were 6 hours off, as it seems the native time in the dockers is UTC and something is trying to make up for that. ### Are you willing to submit PR? - [X] Yes I am willing to submit a PR! ### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
