[ https://issues.apache.org/jira/browse/AIRFLOW-5538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
kasim updated AIRFLOW-5538: --------------------------- Description: >From [https://airflow.apache.org/scheduler.html] : > Note that if you run a DAG on a schedule_interval of one day, the run > stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In > other words, the job instance is started once the period it covers has > ended. This feature is very hurt . For example I have etl job which run every day, schedule_interval is `0 1 * * *`, so it will trigger 2019-09-22 01:00:00 job on 2019-09-23 01:00:00 . But my etl is processing all data before start_date , means data range is between (history, 2019-09-23 00:00:00) , and we can't use `datetime.now()` because this is unable to reproduce. This force me add 1 day to execution_date: ```python etl_end_time = "\{{ (execution_date + macros.timedelta(days=1)).strftime('%Y-%m-%d 00:00:00') }}" ``` However, when I need run a job with schedule_interval `45 2,3,4,5,6 * * *` , the `2019-09-22 06:45:00` job would run on `2019-09-23 02:45:00`, which is the day after execution_date . Instead of adding a day, I had to changed schedule_interval `45 2,3,4,5,6,7 * * *` and put a dummy operator on last run. And in this situation , you don't need add one day to execution_date , this means you have to define two `etl_end_time` to represent a same date in jobs with different schedule_interval . All these are very uncomfortable for me , adding a config or built-in method to make execution_date equal to start_date. would be very nice . was: >From https://airflow.apache.org/scheduler.html : > Note that if you run a DAG on a schedule_interval of one day, the run > stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In > other words, the job instance is started once the period it covers has > ended. This feature is very hurt . For example I have etl job which run every day, schedule_interval is `0 1 * * *`, so it will trigger 2019-09-22 01:00:00 job on 2019-09-23 01:00:00 . But my etl is processing all data before start_date , means data range is between (history, 2019-09-23 00:00:00) , and we can't use `datetime.now()` because this is unable to reproduce. This force me add 1 day to execution_date: ```python etl_end_time = "\{{ (execution_date + macros.timedelta(days=1)).strftime('%Y-%m-%d 00:00:00') }}" ``` However, when I need run a job with schedule_interval `45 2,3,4,5,6 * * *` , the `2019-09-22 06:45:00` job would run on `2019-09-23 02:45:00`, which is the day after execution_date . Instead of adding a day, I had to changed schedule_interval `45 2,3,4,5,6,7 * * *` and put a dummy operator on last run. And in this situation , you don't need add one day to execution_date , this means you have to define two `etl_end_time` to represent a same date in jobs with different schedule_interval . All these are very uncomfortable for me , is there any config or built-in method to make execution_date equal to start_date ? Or I have to modify airflow source code ... > Add a flag to make scheduling trigger on start_date instead of execution_date > (make execution_date equal to start_date) > ----------------------------------------------------------------------------------------------------------------------- > > Key: AIRFLOW-5538 > URL: https://issues.apache.org/jira/browse/AIRFLOW-5538 > Project: Apache Airflow > Issue Type: Improvement > Components: DagRun > Affects Versions: 1.10.5 > Reporter: kasim > Priority: Major > > From [https://airflow.apache.org/scheduler.html] : > > Note that if you run a DAG on a schedule_interval of one day, the run > > stamped 2016-01-01 will be trigger soon after 2016-01-01T23:59. In > > other words, the job instance is started once the period it covers has > > ended. > This feature is very hurt . > For example I have etl job which run every day, schedule_interval is `0 1 * * > *`, so it will trigger 2019-09-22 01:00:00 job on 2019-09-23 01:00:00 . But > my etl is processing all data before start_date , means data range is between > (history, 2019-09-23 00:00:00) , and we can't use `datetime.now()` because > this is unable to reproduce. This force me add 1 day to execution_date: > ```python > etl_end_time = "\{{ (execution_date + > macros.timedelta(days=1)).strftime('%Y-%m-%d 00:00:00') }}" > ``` > However, when I need run a job with schedule_interval `45 2,3,4,5,6 * * *` , > the `2019-09-22 06:45:00` job would run on `2019-09-23 02:45:00`, which is > the day after execution_date . Instead of adding a day, I had to changed > schedule_interval `45 2,3,4,5,6,7 * * *` and put a dummy operator on last run. > And in this situation , you don't need add one day to execution_date , this > means you have to define two `etl_end_time` to represent a same date in jobs > with different schedule_interval . > All these are very uncomfortable for me , adding a config or built-in method > to make execution_date equal to start_date. would be very nice . -- This message was sent by Atlassian Jira (v8.3.4#803005)