dengpenn opened a new issue, #39215:
URL: https://github.com/apache/airflow/issues/39215

   ### Description
   
   During our adoption of Airflow, the scheduler might create hundreds of pods 
during main scheduling loop. I propose to add two kind of metrics: the response 
code of k8s client and latency of creating/patching/deleting the pod.
   
   ### Use case/motivation
   
   Airflow executor create one pod for each individual task. During peak time, 
we saw 800+ tasks were scheduled and the latency of underlying K8s API 
increased. The executor's heartbeat might be delayed due to the creation of 
task pods, potentially affecting the scheduler's heartbeat. It will be good to 
have metrics to monitor the response code and the latency of k8s API for 
creating/patching/deleting the pod.
   
   ### Related issues
   
   N/A
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to