dstandish commented on a change in pull request #22184:
URL: https://github.com/apache/airflow/pull/22184#discussion_r828760938



##########
File path: airflow/dag_processing/processor.py
##########
@@ -412,20 +410,20 @@ def manage_slas(self, dag: DAG, session: Session = None) 
-> None:
             else:
                 while next_info.logical_date < ts:
                     next_info = dag.next_dagrun_info(next_info.data_interval, 
restricted=False)
-
                     if next_info is None:
                         break
-                    if (ti.dag_id, ti.task_id, next_info.logical_date) in 
recorded_slas_query:
+                    next_run_id = DR.generate_run_id(DagRunType.SCHEDULED, 
next_info.logical_date)
+                    if (ti.dag_id, ti.task_id, next_run_id, ti.map_index) in 
recorded_sla_misses:

Review comment:
       though it is weird ... i don't understand why we immediately call next 
run here
   
https://github.com/apache/airflow/blob/main/airflow/dag_processing/processor.py#L414
   
   we're already at "the next run" relative to the last completed TI -- why 
don't we see if _that_ run has an SLA miss?  it seems we skip to "the run 
_after_ the run after" the latest TI.  so that we could only get an SLA miss if 
airflow is 2 runs behind 🤪 
   
   i need to actually do some live testing to understand how this actually 
behaves.
   
   update: i did some live testing, and yeah, SLAs are all screwed up
   
   here's an example:
   
![image](https://user-images.githubusercontent.com/15932138/158741020-7b21db86-8cbb-439c-9c66-aecbe4365e50.png)
   
   i created a task that is sleep(300), SLA of one minute, and runs every 5 
minutes.  here you can see that first TI gets no miss (expected) and the second 
TI does not get any miss.  the one that is created is always 2 runs ahead of 
the latest successful TI 🤦 
   
   and one ahead of the one that is running.
   
   also visible here is an odd quirk that appears to be a bug re timetables or 
data intervals or something in that area: the execution dates are wiggly i.e. 
not a value `N * timedelta + dag.start_date`.  they have seemingly random 
millis precision.  and as a consequence, not only are the SlaMiss records in 
the future, but i have found that they may _never_ correspond to TIs that 
actually come into existence. 🤷 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to