ultrabug edited a comment on issue #2460: [AIRFLOW-1424] make the next 
execution date of DAGs visible
URL: 
https://github.com/apache/incubator-airflow/pull/2460#issuecomment-436295821
 
 
   @XD-DENG bad news, the example DAG in the documentation is breaking the 
scheduler on master so even the documentation is wrong.
   
   Fresh installation, if I run the scheduler using the DAG:
   
   ```python
   """
   Code that goes along with the Airflow tutorial located at:
   
https://github.com/airbnb/airflow/blob/master/airflow/example_dags/tutorial.py
   """
   from airflow import DAG
   from airflow.operators.bash_operator import BashOperator
   from datetime import datetime, timedelta
   
   
   default_args = {
       'owner': 'airflow',
       'depends_on_past': False,
       'start_date': datetime(2015, 12, 1),
       'email': ['[email protected]'],
       'email_on_failure': False,
       'email_on_retry': False,
       'retries': 1,
       'retry_delay': timedelta(minutes=5),
       'schedule_interval': '@hourly',
   }
   
   dag = DAG('tutorial', catchup=False, default_args=default_args)
   ```
   
   nothing happens, the scheduler does not pick up anything
   
   now if I change the catchup parameter to `True` 
   
   ```python
   dag = DAG('tutorial', catchup=True, default_args=default_args)
   ```
   
   I get the scheduler failing with
   
   ```
   Process DagFileProcessor1-Process:
   Traceback (most recent call last):
     File "/usr/lib64/python2.7/multiprocessing/process.py", line 267, in 
_bootstrap
       self.run()
     File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
       self._target(*self._args, **self._kwargs)
     File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", 
line 395, in helper
       pickle_dags)
     File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", 
line 74, in wrapper
       return func(*args, **kwargs)
     File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", 
line 1726, in process_file
       self._process_dags(dagbag, dags, ti_keys_to_schedule)
     File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", 
line 1426, in _process_dags
       dag_run = self.create_dag_run(dag)
     File "/home/alexys/github/incubator-airflow_numberly/airflow/utils/db.py", 
line 74, in wrapper
       return func(*args, **kwargs)
     File "/home/alexys/github/incubator-airflow_numberly/airflow/jobs.py", 
line 872, in create_dag_run
       if next_run_date > timezone.utcnow():
   TypeError: can't compare datetime.datetime to NoneType
   ```
   
   That None result is annoying even the scheduler :)
   
   EDIT: quoting the documentation for expected behavior
   
   ```
   In the example above, if the DAG is picked up by the scheduler daemon on 
2016-01-02 at 6 AM, (or from the command line), a single DAG Run will be 
created, with an execution_date of 2016-01-01, and the next one will be created 
just after midnight on the morning of 2016-01-03 with an execution date of 
2016-01-02.
   
   If the dag.catchup value had been True instead, the scheduler would have 
created a DAG Run for each completed interval between 2015-12-01 and 2016-01-02 
(but not yet one for 2016-01-02, as that interval hasn’t completed) and the 
scheduler will execute them sequentially. This behavior is great for atomic 
datasets that can easily be split into periods. Turning catchup off is great if 
your DAG Runs perform backfill internally.
   ```

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to