yuqian90 commented on a change in pull request #5458: [AIRFLOW-4495] allow 
externally triggered dags to run for future 'Exe…
URL: https://github.com/apache/airflow/pull/5458#discussion_r318386767
 
 

 ##########
 File path: airflow/jobs/scheduler_job.py
 ##########
 @@ -684,13 +684,6 @@ def _process_task_instances(self, dag, 
task_instances_list, session=None):
         active_dag_runs = []
         for run in dag_runs:
             self.log.info("Examining DAG run %s", run)
-            # don't consider runs that are executed in the future
-            if run.execution_date > timezone.utcnow():
-                self.log.error(
-                    "Execution date is in future: %s",
-                    run.execution_date
-                )
-                continue
 
 Review comment:
   @XD-DENG I actually disagree with the statement "If your DagRun's 
execution_date is in the future, for sure it should not be considered for 
execution"
   
   We should think about what execution_date is. It is definitely NOT the 
datetime tasks in the DAG starts running. So there's no reason to make the 
scheduler ignore tasks with an execution_date in the future. In the current 
implementation, if execution_date is T, and we set Dag.schedule to a cron 
expression, the first task in the DAG runs at execution_date (T + one 
schedule_interval). See https://airflow.apache.org/concepts.html
   "The time Airflow triggers a DAG should equal execution_date plus one 
schedule interval if cron expression is used: The scheduler runs your job one 
schedule_interval AFTER the start date, at the END of the period"
   
   If we set DAG.schedule to None and use external trigger to trigger the DAG, 
naturally we should expect the scheduler to start considering the tasks for 
execution once the DAG is triggered, even the execution_date may be a future 
date. In the current implementation, this is not the case. If execution_date is 
20190801, the earliest time the tasks in the DAG can be considered by the 
scheduler is 20190801 00:00 UTC because of this line of code here. So if 
someone uses external trigger to trigger the dag at 20190731 23:00 UTC with 
execution_date 20190801, the tasks will not be considered for execution until 
20190801 00:00 UTC. 
   
   This is very problematic if you are working with multiple timezones.  E.g if 
you are in Asia/Tokyo and you want the jinja templates to evaluate ds_nodash to 
20190801, you need to make execution_date 20190801. But if you do that, the DAG 
can never start running before 20190801 00:00 UTC which is 20190801 09:00 Tokyo 
time. So if you want something to run at 20190801 8am Tokyo time with 
execution_date 20190801, it is not possible (even if you trigger the DAG before 
9am externally). 
   
   Removing this check for DAG.schedule == None seems to be a move in the right 
direction. Although this pull request needs a lot more testing to make sure it 
does the right thing. And it needs to give people some warning before this 
change goes out.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to