dstandish commented on a change in pull request #5787: [AIRFLOW-5172] Add 
choice of interval edge scheduling
URL: https://github.com/apache/airflow/pull/5787#discussion_r340259297
 
 

 ##########
 File path: airflow/jobs/scheduler_job.py
 ##########
 @@ -648,7 +648,7 @@ def create_dag_run(self, dag, session=None):
             if dag.schedule_interval == '@once':
                 period_end = next_run_date
             elif next_run_date:
-                period_end = dag.following_schedule(next_run_date)
+                period_end = dag.period_end(next_run_date)
 
 Review comment:
   it seems there is an inconsistency in the language here.  
   
   The function is called `period_end`, yet sometimes it returns the start of 
the interval. 
   
   The name `period_end` for this variable reflects an assumption that dags are 
scheduled after the _end_ of the interval.  
   
   That's why the code checks that `period_end <= timezone.utcnow()`: end of 
interval is in the past.
   
   But language of this PR is in conflict with that.  
   
   The parameter `schedule_at_interval_end` implies that the _interval_ doesn't 
change, but _where we schedule_ does.  So, we may schedule at start or end of 
"the interval".  But as written, if `schedule_at_interval_end=False`, it will 
in general be the case that `period_end==excecution_date`, which implies that 
exec date is the end of "the interval" and not the start, and this is a 
contradiction.
   
   It seems that what `period_end` represents in this code is more like 
`run_after_dttm` -- the datetime before which the dag may not be scheduled.  
When `schedule_at_interval_end` is True, we can run after exec date + 1 
interval; otherwise, we can run after exec date.  
   
   So in this bit of code, we probably don't even need `period_end()` because 
we could do this:
   ```
               run_after_dttm = None
               if dag.schedule_interval == '@once':
                   run_after_dttm = next_run_date
               elif next_run_date and not self.schedule_at_interval_end:
                   run_after_dttm = next_run_date
               elif next_run_date and self.schedule_at_interval_end:
                   run_after_dttm = dag.following_schedule(next_run_date)
   ```
   And this:
   ```
               if next_run_date and run_after_dttm and run_after_dttm <= 
timezone.utcnow():
   ```
   
   But elsewhere, it seems that `period_end()` function is used to mean 
`min_run_date` or `target_run_date` or `run_after_date`.  Perhaps a name like 
this would be clearer.
   
   
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to