mik-laj commented on a change in pull request #9544:
URL: https://github.com/apache/airflow/pull/9544#discussion_r522836502
##########
File path: airflow/models/dagrun.py
##########
@@ -565,6 +566,40 @@ def _are_premature_tis(
return True
return False
+ def _emit_true_scheduling_delay_stats_for_finished_state(self,
finished_tis):
+ """
+ This is a helper method to emit the true scheduling delay stats, which
is defined as
+ the time when the first task in DAG starts minus the expected DAG run
datetime.
+ This method will be used in the update_state method when the state of
the DagRun
+ is updated to a completed status (either success or failure). The
method will find the first
+ started task within the DAG and calculate the expected DagRun start
time (based on
+ dag.execution_date & dag.schedule_interval), and minus these two
values to get the delay.
+ The emitted data may contains outlier (e.g. when the first task was
cleared, so
+ the second task's start_date will be used), but we can get rid of the
the outliers
+ on the stats side through the dashboards tooling built.
+ Note, the stat will only be emitted if the DagRun is a scheduler
triggered one
+ (i.e. external_trigger is False).
+ """
+ try:
+ if self.state == State.RUNNING:
+ return
+ if self.external_trigger:
+ return
+ if not finished_tis:
+ return
+ dag = self.get_dag()
+ ordered_tis_by_start_date = [ti for ti in finished_tis if
ti.start_date]
+ ordered_tis_by_start_date.sort(key=lambda ti: ti.start_date,
reverse=False)
+ first_start_date = getattr(ordered_tis_by_start_date[0],
'start_date', None)
Review comment:
I don't understand why `getattr` is used here. Can you tell me more
about it?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]