dstandish commented on a change in pull request #5787: [AIRFLOW-5172] Add
choice of interval edge scheduling
URL: https://github.com/apache/airflow/pull/5787#discussion_r340259297
##########
File path: airflow/jobs/scheduler_job.py
##########
@@ -648,7 +648,7 @@ def create_dag_run(self, dag, session=None):
if dag.schedule_interval == '@once':
period_end = next_run_date
elif next_run_date:
- period_end = dag.following_schedule(next_run_date)
+ period_end = dag.period_end(next_run_date)
Review comment:
it seems there is an inconsistency in the language here.
The function is called `period_end`, yet sometimes it returns the start of
the interval.
The name `period_end` for this variable reflects an assumption that dags are
scheduled after the _end_ of the interval.
That's why the code checks that `period_end <= timezone.utcnow()`: end of
interval is in the past.
But language of this PR is in conflict with that.
The parameter `schedule_at_interval_end` implies that the _interval_ doesn't
change, but _where we schedule_ does. So, we may schedule at start or end of
"the interval". But as written, if `schedule_at_interval_end=False`, it will
in general be the case that `period_end==excecution_date`, which implies that
exec date is the end of "the interval" and not the start, and this is a
contradiction.
It seems that what `period_end` represents in this code is more like
`run_after_dttm` -- the datetime before which the dag may not be scheduled.
When `schedule_at_interval_end` is True, we can run after exec date + 1
interval; otherwise, we can run after exec date.
So in this bit of code, we probably don't even need `period_end()` because
we could do this:
```
run_after_dttm = None
if dag.schedule_interval == '@once':
run_after_dttm = next_run_date
elif next_run_date and not self.schedule_at_interval_end:
run_after_dttm = next_run_date
elif next_run_date and self.schedule_at_interval_end:
run_after_dttm = dag.following_schedule(next_run_date)
```
And this:
```
if next_run_date and run_after_dttm and run_after_dttm <=
timezone.utcnow():
```
But elsewhere, it seems that `period_end()` function is used to mean
`min_run_date` or `target_run_date` or `run_after_date`. Perhaps a name like
this would be clearer.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services