Hi everyone,

I'd like to (re)start the discussion about a new feature I'd like to add for Airflow 2.1, that I am loosely calling "improving schedule_interval" (catchy name I know!)

I have two main high-level goals in mind here:

1. To reduce the confusion around execution_date (specifically the naming of the parameter!) - the whole start vs end discussion.
2. To support more complex schedules.

Previous thread on this point 1 here: <https://lists.apache.org/thread.html/2b12ae265795ff2e655a5161c972f5c7bbe60722a12849a0e2c5c55f%40%3Cdev.airflow.apache.org%3E>, (but I'm taking a bit of a step back from that to think if there's a bigger change we could make that encompases this)


I don't yet have a concrete plan, nor implementation in mind, but I'd like to start collecting peoples "wish list" when it comes to scheduling DAGS:

- What do you wish you could express natively in terms of scheduling your DAGs? (I.e. without using "hacks" such as date sensor/skip tasks at start)
- What schedules do you wish you could express now, that you just can't?
- Do you have good example workflows that give a good example of where you want schedule at start? Follow up question: do you also want this to be different for different DAGs in your Airflow install?


Existing issues:
<https://github.com/apache/airflow/issues/8649> "Add support for more than 1 cron exp per DAG" <https://github.com/apache/airflow/issues/10194> "Ability to better support odd scheduling time" <https://github.com/apache/airflow/issues/10449> "Dynamic Schedule Intervals" <https://github.com/apache/airflow/issues/10123> "Job Schedule Interval on 2nd & 4th Tuesday"

I'll start:

Case1:

One example that came up recently in slack was an actual astronomer wanting a DAG to run with a schedule of "@sunset"! This also brings up the subject of "running dags at interval start or end"

Case2:

I'd like to be able to run a daily process at the end of each week day. I.e. to process data for Monday..Friday. The naive way of expressing this would be "0 0 * * MON-FRI", but that means that the dags would run Tuesday, Wednesday ,Thursday ,Friday, Monday -- meaning Friday's data isn't processed until Monday!

My thoughts on this is we need to separate schedule interval (when to run a task) from the period duration (i.e look at one days worth of data).

Thanks,
Ash




Reply via email to