dirrao opened a new issue, #31958:
URL: https://github.com/apache/airflow/issues/31958

   ### Description
   
   The function clear_not_launched_queued_tasks takes time when there are more 
queued tasks (a few hundred). The reason for latency is due to 
list_namespaced_pod kube API triggered for each queued task. It leads to 
scheduler heartbeat delay. Improve this function by calling list_namespaced_pod 
function using pagination and getting all the required pods with fewer calls.
   
   ### Use case/motivation
   
   As we run the airflow at a large scale, we have found that the 
clear_not_launched_queued_tasks  function might take a few minutes (> 5 
minutes). These will delay the heartbeat of the scheduler and leads to the 
scheduler instance restarting/killed. To avoid this issue, use a list 
namespaced pod with pagination and get all the worker pods with fewer calls.
   
   ### Related issues
   
   _No response_
   
   ### Are you willing to submit a PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to