1fanwang opened a new pull request, #66807: URL: https://github.com/apache/airflow/pull/66807
### Problem `dagrun.first_task_scheduling_delay` measures `data_interval_end → first_start_date`, which conflates two distinct latencies: scheduler latency to enqueue the first task, and executor latency to pick the task up. When a Dag run's first task starts late, that single timer can't tell ops which phase is slow. The executor-pickup portion (`queued_at → first_start_date`) has no metric today. ### Fix Add `dagrun.first_task_start_delay`, computed as `first_start_date - queued_at` on Dag run completion, tagged by `dag_id` and `run_type` to match the existing tag shape on `first_task_scheduling_delay`. It is emitted next to the existing scheduling-delay metric, only when `queued_at` is set and the delta is positive. The existing metric is unchanged. ### Tests `test_emit_first_task_start_delay` constructs a scheduled Dag run with a known `queued_at` and a known first-task `start_date`, calls `update_state`, mocks `stats.timing`, and asserts the new metric is emitted with the expected delta and tags. A parametrised case with `queued_at = None` confirms the new metric stays off when no `queued_at` is recorded. Closes #66802 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
