1fanwang opened a new issue, #66802:
URL: https://github.com/apache/airflow/issues/66802

   ### Description
   
   `dagrun.first_task_scheduling_delay` measures `data_interval_end → 
first_start_date`, which conflates two things: (a) scheduler latency to enqueue 
the first task, and (b) executor latency to pick the task up. Splitting these 
helps locate where time is being spent when a DAG run starts late.
   
   The executor-pickup portion (`queued_at → first_start_date`) has no metric 
today.
   
   ### Use case / motivation
   
   When the first task of a DAG run starts late, ops want to know: was the 
scheduler slow to queue it, or was the executor slow to pick it up? One metric 
per phase.
   
   ### Proposal
   
   Add `dagrun.first_task_start_delay`, computed as `first_start_date - 
queued_at` on dag run completion. Tag by `dag_id` and `run_type` to match the 
existing tag shape on `first_task_scheduling_delay`.
   
   I expect there's a "do we want N metrics or one with M dimensions" 
discussion — flagging this as an issue first instead of a direct PR so the 
shape can settle before code.
   
   ### Are you willing to submit a PR?
   
   - [X] Yes
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's Code of Conduct
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to