dstandish commented on a change in pull request #5372: [AIRFLOW-3057] add 
prev_*_date_success to template context
URL: https://github.com/apache/airflow/pull/5372#discussion_r291314335
 
 

 ##########
 File path: airflow/models/taskinstance.py
 ##########
 @@ -528,25 +532,43 @@ def _get_previous_ti(self, session=None):
                 if not previous_scheduled_date:
                     return None
 
-                return TaskInstance(task=self.task,
-                                    execution_date=previous_scheduled_date)
+                return TaskInstance(task=self.task, 
execution_date=previous_scheduled_date)
 
             dr.dag = dag
-            if dag.catchup:
-                last_dagrun = dr.get_previous_scheduled_dagrun(session=session)
+            if dag.catchup is False or state is not None or 
dag.schedule_interval is None:
 
 Review comment:
   > The reasoning behind this `if` is hard to work out - could you expand on 
it with a comment saying why?
   
   This is what I ended up with:
   ```
   # We always ignore schedule in dagrun lookup when `state` is given or 
`schedule_interval is None`.
   # For legacy reasons, when `catchup=True`, we use 
`get_previous_scheduled_dagrun` unless `ignore_schedule`.
   ignore_schedule = state is not None or dag.schedule_interval is None
   if dag.catchup is True and not ignore_schedule:
       last_dagrun = dr.get_previous_scheduled_dagrun(session=session)
   else:
       last_dagrun = dr.get_previous_dagrun(session=session, state=state)
   ```
   
   **Discussion**
   
   With status quo, if `catchup is True`, `previous_ti` is "last ti according 
to schedule, if one exists", but if `catchup is False`, previous_ti is "last ti 
absolutely, if one exists".  
   
   When we do the spring cleaning of context params in 2.0, we should fix this 
and call things what they are, and not have behavior subtly change based on 
other params if it can be avoided.  
   
   E.g. I would say `previous_ti` should _always_ give the absolute last ti 
without regard to schedule, and we should make a separate property 
`previous_scheduled_ti` for the last _scheduled_ ti.  Then these methods would 
behave the same no matter `catchup` or `schedule_interval`; one would call prev 
dag run, the other would call prev scheduled dag run.  I would be happy to 
separate these in this manner in a subsequent PR if you like this idea.
   
   **Note on the `schedule is None` and `catchup=True` case**
   In status quo, for this odd case `previous_ti` _always_ returns `None`, 
because it is using `get_previous_scheduled_dagrun` when there is no schedule 
to use.  I discovered this when adding tests for `previous_ti`.  This PR fixes 
that by using `get_previous_dagrun` in this case.
   
   Please LMK thoughts!   In particular:
   * Are these comments sufficient?
   * Do you like the idea of splitting previous_ti (absolute) and 
previous_scheduled_ti (scheduled)?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to