josh-fell commented on a change in pull request #19237:
URL: https://github.com/apache/airflow/pull/19237#discussion_r737847953
##########
File path:
airflow/providers/amazon/aws/example_dags/example_dms_full_load_task.py
##########
@@ -50,20 +49,14 @@
with DAG(
dag_id='dms_full_load_task_run_dag',
- default_args={
- 'owner': 'airflow',
- 'depends_on_past': False,
- 'email': ['[email protected]'],
- 'email_on_failure': False,
- 'email_on_retry': False,
- },
dagrun_timeout=timedelta(hours=2),
- start_date=days_ago(2),
+ start_date=datetime(2021, 1, 1),
schedule_interval='0 3 * * *',
+ catchup=False,
Review comment:
> What was the motivation for moving to static start dates?
For a while now using a static `start_date` value has been communicated as
best practice when authoring DAGs to alleviate common headaches between
`start_date` and `schedule_interval` interaction when a dynamic value is used.
Unfortunately the docs, examples, and code snippets do not reflect this
sentiment and are probably contributing to the headaches.
Hopefully making the `start_date` far into the past would make it more
obvious that the value should change if folks copy this DAG.
> Should we add catchup=False to all examples to help new users avoid the
catchup explosion?
I feel like there are plenty of examples in the repo, docs, and in the wild
where `catchup=False` is used, but maybe the question is "why don't new users
know about `catchup` early on"? It might be better to present this
concept/parameter more explicitly for new users in the tutorial documentation.
WDYT?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]