[
https://issues.apache.org/jira/browse/AIRFLOW-2341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16479516#comment-16479516
]
Grant West commented on AIRFLOW-2341:
-------------------------------------
Which way is intuitive depends on what you are doing. There are use cases for
triggering at both the start and the end of the intervals.
If you are doing traditional ETL work, it makes sense to trigger at the end of
each interval. The advantage of this is that if you have daily jobs that need
all of the data to be ready, you trigger at the end of the interval, but then
the execution_date for the run is the beginning of the day so the dates in the
GUI make a lot of sense.
If you are doing something like generating a report every day at 2pm, then the
job isn't necessarily related to the last 24 hours of data and so it makes more
sense to trigger at the beginning of the interval.
My suggestion would be to have another configuration to use along side of
`schedule_interval`, something like this:
{code:java}
DAG(
dag_id,
default_args=default_args,
catchup=True,
schedule_interval='0 0 5 * *',
trigger='start_of_interval'
)
{code}
> Trigger timing should follow cron style when we use cron style
> schedule_interval
> --------------------------------------------------------------------------------
>
> Key: AIRFLOW-2341
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2341
> Project: Apache Airflow
> Issue Type: Wish
> Components: scheduler
> Reporter: Yu Ishikawa
> Priority: Minor
>
> I understand "The Airflow scheduler triggers the task soon after the
> {{start_date + scheduler_interval}} is passed." in
> [FAQ|https://airflow.apache.org/faq.html#why-isn-t-my-task-getting-scheduled].
> However, in my opinion, the trigger timing is confusing.
> For example, when I set start_date to 2018-04-01 and set schedule_interval to
> '0 0 5 * *' in a DAG, the DAG will be triggered on 2018-05-05. In general, we
> expect the job should be trigger on 2018-04-05.
> If we adapt the cron style scheduling, it should follow the trigger timing of
> the cron style as well. Otherwise, we should have another option for the way
> of cron style scheduling, instead of schedule_interval. What do you think?
> {noformat}
> default_args = {
> 'start_date': datetime(2018, 4, 1),
> }
> dag_id = get_file_name(__file__)
> dag = DAG(
> dag_id,
> default_args=default_args,
> catchup=True,
> schedule_interval='0 0 5 * *'){noformat}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)