Acehaidrey commented on a change in pull request #9544:
URL: https://github.com/apache/airflow/pull/9544#discussion_r452511482
##########
File path: airflow/models/dagrun.py
##########
@@ -411,6 +412,44 @@ def _are_premature_tis(
return True
return False
+ @provide_session
+ def _emit_true_scheduling_delay_stats_for_finished_state(self,
session=None):
+ """
+ This is a helper method to emit the true scheduling delay stats, which
is defined as
+ the time when the first task in DAG starts minus the expected DAG run
datetime.
+ This method will be used in the update_state method when the state of
the DagRun
+ is updated to a completed status (either success or failure). The
method will find the first
+ started task within the DAG and calculate the expected DagRun start
time (based on
+ dag.execution_date & dag.schedule_interval), and minus these two to
get the delay.
+
+ The emitted data may contains outlier (e.g. when the first task was
cleared, so
+ the second task's start_date will be used), but we can get ride of the
the outliers
+ on the stats side through the dashboards.
+
+ Note, the stat will only be emitted if the DagRun is a scheduler
triggered one
+ (i.e. external_trigger is False).
+ """
+ if self.state == State.RUNNING:
+ return
+
+ try:
+ if self.external_trigger:
+ return
+ # Get the task that has the earliest start_date
+ qry = session.query(TI).filter(
Review comment:
I just need the actual start time, so I can filter down even more on the
cardinality. will add that improvement, thanks!
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]