1fanwang opened a new issue, #66802: URL: https://github.com/apache/airflow/issues/66802
### Description `dagrun.first_task_scheduling_delay` measures `data_interval_end → first_start_date`, which conflates two things: (a) scheduler latency to enqueue the first task, and (b) executor latency to pick the task up. Splitting these helps locate where time is being spent when a DAG run starts late. The executor-pickup portion (`queued_at → first_start_date`) has no metric today. ### Use case / motivation When the first task of a DAG run starts late, ops want to know: was the scheduler slow to queue it, or was the executor slow to pick it up? One metric per phase. ### Proposal Add `dagrun.first_task_start_delay`, computed as `first_start_date - queued_at` on dag run completion. Tag by `dag_id` and `run_type` to match the existing tag shape on `first_task_scheduling_delay`. I expect there's a "do we want N metrics or one with M dimensions" discussion — flagging this as an issue first instead of a direct PR so the shape can settle before code. ### Are you willing to submit a PR? - [X] Yes ### Code of Conduct - [X] I agree to follow this project's Code of Conduct -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
