mpeteuil opened a new issue #14969: URL: https://github.com/apache/airflow/issues/14969
**Apache Airflow version**: 1.8 - 2.0.1 (tested against 1.10.4, 1.10.15, 2.0.1) **Kubernetes version (if you are using kubernetes)** (use `kubectl version`): N/A **Environment**: - **Cloud provider or hardware configuration**: - **OS** (e.g. from /etc/os-release): - **Kernel** (e.g. `uname -a`): - **Install tools**: - **Others**: Python 2.7.16, 3.7.6 (I don't think this is a factor) **What happened**: There is an issue with the scheduling of DAGs that use a `timedelta` object as the DAG `schedule_interval` argument while also having `catchup` set to `False`. What happens is that if you have a DAG that meets that criteria then when it's turned on it will ignore the time component of the start date and just run immediately. This was previously reported in [[AIRFLOW-1156]](https://issues.apache.org/jira/browse/AIRFLOW-1156) and was closed with https://github.com/apache/airflow/pull/8776 which fixed the two dag runs problem that was also mentioned in that issue. **What you expected to happen**: I expect it to behave the same as a DAG using a cron expression for the `schedule_interval` under otherwise same conditions (i.e. `catchup` still set to `False`). I believe this is a result of how [`Dag#following_schedule` and `Dag#previous_schedule` are implemented](https://github.com/apache/airflow/blob/1.10.15/airflow/models/dag.py#L409-L463). I traced the `SchedulerJob#create_dag_run` method and I believe this is due to the `Dag` methods used in there. **How to reproduce it**: Create two dags with `catchup` set to `False` that are exactly the same except that one will use a `timedelta` object as the `schedule_interval` argument and the other will use a cron expression. Set a `start_date` of sometime in the past. Turn them both on and you should see the one with a `timedelta` as the `schedule_interval` has disregarded the time part of the `start_date` and used the current time when it started executing as the time part of the `execution_date`. The version using the cron expression will have used the time from the cron expression. Example DAG: ```py import datetime as dt from airflow import DAG from airflow.operators.dummy_operator import DummyOperator dag_params = { 'dag_id': 'schedule_interval_timedelta_bug_example', 'default_args':{ 'owner': 'Administrator', 'depends_on_past': False, 'retries': 0, 'email': ['[email protected]'] }, 'schedule_interval': dt.timedelta(days=1), 'start_date': dt.datetime(year=2021, month=1, day=1, hour=11, minute=10), 'catchup': False } with DAG(**dag_params) as dag: DummyOperator(task_id='start') >> DummyOperator(task_id='end') ``` For the cron version just change the `schedule_interval` to `10 11 * * *`. Here's a screenshot of this happening on 2.0.1 (although the bug exists in much older versions as well):  **Anything else we need to know**: I've only tested this on DAGs that have a 1 day schedule interval, but testing with other intervals could reveal if this is a problem at finer grained intervals or if it's isolated to daily runs. Based on what I saw in `Dag#following_schedule` and `Dag#previous_schedule` I suspect this would be problem with shorter intervals as well. Tested with the `SequentialExecutor` and `StandardTaskRunner`, which I don't _think_ are a factor, but it's certainly possible. Happy to provide other details or help in any way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
