ultrabug edited a comment on issue #2460: [AIRFLOW-1424] make the next execution date of DAGs visible URL: https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436295821 @XD-DENG bad news, the example DAG in the documentation is breaking the scheduler on master so even the documentation is wrong. Fresh installation, if I run the scheduler using the DAG: ```python """ Code that goes along with the Airflow tutorial located at: https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py """ from airflow import DAG from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta default_args = { 'owner': 'airflow', 'depends_on_past': False, 'start_date': datetime(2015, 12, 1), 'email': ['[email protected]'], 'email_on_failure': False, 'email_on_retry': False, 'retries': 1, 'retry_delay': timedelta(minutes=5), 'schedule_interval': '@hourly', } dag = DAG('tutorial', catchup=False, default_args=default_args) ``` nothing happens, the scheduler does not pick up anything now if I change the catchup parameter to `True` ```python dag = DAG('tutorial', catchup=True, default_args=default_args) ``` I get the scheduler failing with ``` Process DagFileProcessor1-Process: Traceback (most recent call last): File "/usr/lib64/python2.7/multiprocessing/process.py", line 267, in _bootstrap self.run() File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run self._target(*self._args, **self._kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 395, in helper pickle_dags) File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 1726, in process_file self._process_dags(dagbag, dags, ti_keys_to_schedule) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 1426, in _process_dags dag_run = self.create_dag_run(dag) File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", line 872, in create_dag_run if next_run_date > timezone.utcnow(): TypeError: can't compare datetime.datetime to NoneType ``` That None result is annoying even the scheduler :) EDIT: quoting the documentation for expected behavior ``` In the example above, if the DAG is picked up by the scheduler daemon on 2016-01-02 at 6 AM, (or from the command line), a single DAG Run will be created, with an execution_date of 2016-01-01, and the next one will be created just after midnight on the morning of 2016-01-03 with an execution date of 2016-01-02. If the dag.catchup value had been True instead, the scheduler would have created a DAG Run for each completed interval between 2015-12-01 and 2016-01-02 (but not yet one for 2016-01-02, as that interval hasn’t completed) and the scheduler will execute them sequentially. This behavior is great for atomic datasets that can easily be split into periods. Turning catchup off is great if your DAG Runs perform backfill internally. ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
