uranusjr commented on a change in pull request #16352:
URL: https://github.com/apache/airflow/pull/16352#discussion_r679619000
##########
File path: airflow/jobs/backfill_job.py
##########
@@ -755,8 +759,13 @@ def _execute(self, session=None):
start_date = self.bf_start_date
- # Get intervals between the start/end dates, which will turn into dag
runs
- run_dates = self.dag.get_run_dates(start_date=start_date,
end_date=self.bf_end_date, align=True)
+ # Get DagRun schedule between the start/end dates, which will turn
into dag runs.
+ dagrun_start_date = timezone.coerce_datetime(start_date)
+ if self.bf_end_date is None:
+ dagrun_end_date = pendulum.now(timezone.utc)
+ else:
+ dagrun_end_date = pendulum.instance(self.bf_end_date)
Review comment:
Personally I dislike the pattern “if end if None, default to now” since
it’s too implicit, and users (other contributors of Airflow) can accidentally
forget to handle a possibly-None input and cause subtle bugs. For example, say
there’s a view to return scheduled run dates of a DAG:
```python
# If the user does not specify, defaults to the DAG’s start_date.
start_date = request.values.get("start_date")
# If the user does not specify, defaults to the DAG’s end_date.
end_date = request.values.get("end_date")
# Oops I forgot to actually check if the user-supplied start_date and
end_date is None.
# Now the runs returned are wrong.
runs = list(dag.iter_dagrun_infos_between(start_date, end_date))
```
This can manage to slip through unit tests (if you forgot to do this in
code, chances are you wouldn’t remember to test it) and cause either bugs or
(worse) incorrect user expectations we need to support for backward
compatibility onwards (Hyrum’s law).
Making `iter_dagrun_infos_between` accept only datetimes forces the caller
to think it through when exactly the iterator should end, which is a good thing
since this can potentially run for a *long* time if `earliest` is a long time
in the past.
```python
# Mypy now raises an error here and I’m reminded of the None case.
runs = list(dag.iter_dagrun_infos_between(start_date, end_date))
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]