I've upgraded Airflow to today's master branch.
Got following regression in attempt to start a DAG:
Process DagFileProcessor209-Process:
> Traceback (most recent call last):
> File
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py",
> line 258, in _bootstrap
> self.run()
> File
> "/opt/cloudera/parcels/Anaconda/lib/python2.7/multiprocessing/process.py",
> line 114, in run
> self._target(*self._args, **self._kwargs)
> File "/opt/airflow/airflow-20170506/src/airflow/airflow/jobs.py", line
> 346, in helper
> pickle_dags)
> File "/opt/airflow/airflow-20170506/src/airflow/airflow/utils/db.py",
> line 48, in wrapper
> result = func(*args, **kwargs)
> File "/opt/airflow/airflow-20170506/src/airflow/airflow/jobs.py", line
> 1584, in process_file
> self._process_dags(dagbag, dags, ti_keys_to_schedule)
> File "/opt/airflow/airflow-20170506/src/airflow/airflow/jobs.py", line
> 1173, in _process_dags
> dag_run = self.create_dag_run(dag)
> File "/opt/airflow/airflow-20170506/src/airflow/airflow/utils/db.py",
> line 48, in wrapper
> result = func(*args, **kwargs)
> File "/opt/airflow/airflow-20170506/src/airflow/airflow/jobs.py", line
> 776, in create_dag_run
> if next_start <= now:
> TypeError: can't compare datetime.datetime to NoneType
DAG definition:
main_dag = DAG(
> dag_id = 'DISCOVER-Oracle-Load-Mar2017-v1',
> default_args = default_args, # dafeult
> operators' arguments - see above
> user_defined_macros = dag_macros, # I do not get
> different between
> ## params = dag_macros, #
> user_defined_macros and params
> #
> start_date = datetime.now(), # or e.g.
> datetime(2015, 6, 1)
> # 'end_date' = datetime(2016, 1, 1),
> catchup = False, # Perform
> scheduler catchup (or only run latest)?
> # -
> defaults to True
> schedule_interval = '@once', #
> '@once'=None?
> #
> doesn't create multiple dag runs automatically
> concurrency = 3, # task
> instances allowed to run concurrently
> max_active_runs = 1, # only
> one DAG run at a time
> dagrun_timeout = timedelta(days=4), # no way
> this dag should ran for 4 days
> orientation = 'TB', # default
> graph view
> )
default_args:
default_args = {
> # Security:
> 'owner' : 'rdautkha', # owner
> of the task, using the unix username is recommended
> # 'run_as_user' : None # unix
> username to impersonate while running the task
> # Scheduling:
> 'start_date' : None, # don't
> confuse with DAG's start_date
> 'depends_on_past' : False, # True
> makes sense... but there are bugs around that code
> 'wait_for_downstream' : False, #
> depends_on_past is forced to True if wait_for_downstream
> 'trigger_rule' : 'all_success', #
> all_succcess is default anyway
> # Retries
> 'retries' : 0, # No
> retries
> # 'retry_delay' : timedelta(minutes=5), # check
> retry_exponential_backoff and max_retry_delay too
> # Timeouts and SLAs
> # 'sla' : timedelta(hours=1), # default
> tasks' sla - normally don't run longer
> 'execution_timeout' : timedelta(hours=3), # no
> single task runs 3 hours or more
> # 'sla_miss_callback' # -
> function to call when reporting SLA timeouts
> # Notifications:
> 'email' : ['[email protected]'],
> 'email_on_failure' : True,
> 'email_on_retry' : True,
> # Resource usage:
> 'pool' : 'DISCOVER-Prod', # can
> increase this pool's concurrency
> # 'queue' : 'some_queue',
> # 'priority_weight' : 10,
> # Miscellaneous:
> # on_failure_callback=None, on_success_callback=None,
> on_retry_callback=None
> }
The DAG itself has a bunch of Oracle operators.
Any ideas?
That's a regression from a month old Airflow.
No changes in DAG.
Thank you,
Ruslan Dautkhanov