uranusjr commented on a change in pull request #16352:
URL: https://github.com/apache/airflow/pull/16352#discussion_r679619000



##########
File path: airflow/jobs/backfill_job.py
##########
@@ -755,8 +759,13 @@ def _execute(self, session=None):
 
         start_date = self.bf_start_date
 
-        # Get intervals between the start/end dates, which will turn into dag 
runs
-        run_dates = self.dag.get_run_dates(start_date=start_date, 
end_date=self.bf_end_date, align=True)
+        # Get DagRun schedule between the start/end dates, which will turn 
into dag runs.
+        dagrun_start_date = timezone.coerce_datetime(start_date)
+        if self.bf_end_date is None:
+            dagrun_end_date = pendulum.now(timezone.utc)
+        else:
+            dagrun_end_date = pendulum.instance(self.bf_end_date)

Review comment:
       Personally I dislike the pattern “if end if None, default to now” since 
it’s too implicit, and users (other contributors of Airflow) can accidentally 
forget to handle a possibly-None input and cause subtle bugs. For example, say 
there’s a view to return scheduled run dates of a DAG:
   
   ```python
   # If the user does not specify, defaults to the DAG’s start_date.
   start_date = request.values.get("start_date")
   
   # If the user does not specify, defaults to the DAG’s end_date.
   end_date = request.values.get("end_date")
   
   # Oops I forgot to actually check if the user-supplied start_date and 
end_date is None.
   
   # Now the runs returned are wrong.
   runs = list(dag.iter_dagrun_infos_between(start_date, end_date))
   ```
   
   This can manage to slip through unit tests (if you forgot to do this in 
code, chances are you wouldn’t remember to test it) and cause either bugs or 
(worse) incorrect user expectations we need to support for backward 
compatibility onwards (Hyrum’s law).
   
   Making `iter_dagrun_infos_between` accept only datetimes forces the caller 
to think it through when exactly the iterator should end, which is a good thing 
since this can potentially run for a *long* time if `earliest` is a long time 
in the past.
   
   ```python
   # Mypy now raises an error here and I’m reminded of the None case.
   runs = list(dag.iter_dagrun_infos_between(start_date, end_date))
   ```




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to