yuqian90 commented on a change in pull request #5458: [AIRFLOW-4495] allow
externally triggered dags to run for future 'Exe…
URL: https://github.com/apache/airflow/pull/5458#discussion_r318386767
##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -684,13 +684,6 @@ def _process_task_instances(self, dag,
task_instances_list, session=None):
active_dag_runs = []
for run in dag_runs:
self.log.info("Examining DAG run %s", run)
- # don't consider runs that are executed in the future
- if run.execution_date > timezone.utcnow():
- self.log.error(
- "Execution date is in future: %s",
- run.execution_date
- )
- continue
Review comment:
@XD-DENG I actually disagree with the statement "If your DagRun's
execution_date is in the future, for sure it should not be considered for
execution"
We should think about what execution_date is. It is definitely NOT the
datetime tasks in the DAG starts running. So there's no reason to make the
scheduler ignore tasks with an execution_date in the future. In the current
implementation, if execution_date is T, and we set Dag.schedule to a cron
expression, the first task in the DAG runs at execution_date (T + one
schedule_interval). See https://airflow.apache.org/concepts.html
"The time Airflow triggers a DAG should equal execution_date plus one
schedule interval if cron expression is used: The scheduler runs your job one
schedule_interval AFTER the start date, at the END of the period"
If we set DAG.schedule to None and use external trigger to trigger the DAG,
naturally we should expect the scheduler to start considering the tasks for
execution once the DAG is triggered, even the execution_date may be a future
date. In the current implementation, this is not the case. If execution_date is
20190801, the earliest time the tasks in the DAG can be considered by the
scheduler is 20190801 00:00 UTC because of this line of code here. So if
someone uses external trigger to trigger the dag at 20190731 23:00 UTC with
execution_date 20190801, the tasks will not be considered for execution until
20190801 00:00 UTC.
This is very problematic if you are working with multiple timezones. E.g if
you are in Asia/Tokyo and you want the jinja templates to evaluate ds_nodash to
20190801, you need to make execution_date 20190801. But if you do that, the DAG
can never start running before 20190801 00:00 UTC which is 20190801 09:00 Tokyo
time. So if you want something to run at 20190801 8am Tokyo time with
execution_date 20190801, it is not possible (even if you trigger the DAG before
9am externally).
Removing this check for DAG.schedule == None seems to be a move in the right
direction. Although this pull request needs a lot more testing to make sure it
does the right thing. And it needs to give people some warning before this
change goes out.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services