[ 
https://issues.apache.org/jira/browse/AIRFLOW-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479516#comment-16479516
 ] 

Grant West commented on AIRFLOW-2341:
-------------------------------------

Which way is intuitive depends on what you are doing. There are use cases for 
triggering at both the start and the end of the intervals. 

If you are doing traditional ETL work, it makes sense to trigger at the end of 
each interval. The advantage of this is that if you have daily jobs that need 
all of the data to be ready, you trigger at the end of the interval, but then 
the execution_date for the run is the beginning of the day so the dates in the 
GUI make a lot of sense. 

If you are doing something like generating a report every day at 2pm, then the 
job isn't necessarily related to the last 24 hours of data and so it makes more 
sense to trigger at the beginning of the interval. 

My suggestion would be to have another configuration to use along side of 
`schedule_interval`, something like this:
{code:java}
DAG( 
dag_id, 
default_args=default_args, 
catchup=True, 
schedule_interval='0 0 5 * *',
trigger='start_of_interval'
)
{code}

> Trigger timing should follow cron style when we use cron style 
> schedule_interval
> --------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-2341
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2341
>             Project: Apache Airflow
>          Issue Type: Wish
>          Components: scheduler
>            Reporter: Yu Ishikawa
>            Priority: Minor
>
> I understand "The Airflow scheduler triggers the task soon after the 
> {{start_date + scheduler_interval}} is passed." in 
> [FAQ|https://airflow.apache.org/faq.html#why-isn-t-my-task-getting-scheduled].
>  However, in my opinion, the trigger timing is confusing.
> For example, when I set start_date to 2018-04-01 and set schedule_interval to 
> '0 0 5 * *' in a DAG, the DAG will be triggered on 2018-05-05. In general, we 
> expect the job should be trigger on 2018-04-05.
> If we adapt the cron style scheduling, it should follow the trigger timing of 
> the cron style as well. Otherwise, we should have another option for the way 
> of cron style scheduling, instead of schedule_interval. What do you think?
> {noformat}
> default_args = {
>     'start_date': datetime(2018, 4, 1),
> }
> dag_id = get_file_name(__file__)
> dag = DAG(
>     dag_id,
>     default_args=default_args,
>     catchup=True,
>     schedule_interval='0 0 5 * *'){noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to