dviru opened a new issue #22612:
URL: https://github.com/apache/airflow/issues/22612


   ### Apache Airflow version
   
   2.2.4 (latest released)
   
   ### What happened
   
   Hi Team, I am using airflow 2.2.4 and deployed it on aws eks cluster. I 
noticed that every 5-10 minute schedular down message seeing on airflow UI. 
When I checked airflow schedular log, seeing the lot of below statements.
   
   `[2022-03-21 08:21:21,640] {kubernetes_executor.py:729} INFO - Attempting to 
adopt pod sampletask.05b6f567b4a64bd5beb16e526ba94d7a`
   
   This above statement will print for all completed pod which exist in eks, 
But it is repeating multiple time and as also invoking the PATCH api.
   
   As per my understanding what happing is, below code pulling all the 
completed pod details for every time from EKS cluster and invoking the patch 
API on completed pod. So this activity for 1000 completed POD finishing in 1 
minute, for 7000 completed POD its taking 3-5 minute, thats the reason 
scheduler is going down
   
   <img width="1054" alt="160352813-9ff57de3-782f-4cee-8f7c-f6d5b8a60d29" 
src="https://user-images.githubusercontent.com/10843400/160741990-838f15e2-485c-4c9a-8ca7-c7014e14f0b4.png";>
   
   
   
   ### What you think should happen instead
   
   This schedular will be healthy when we set "delete_worker_pods = True". but 
when set delete_worker_pods =False and completed pod count goes to 7000 to 
10,000 The scheduler should goes down.
   
   The scheduler should be healthy irrespective of how many completed pod exist 
in EKS cluster.
   
   ### How to reproduce
   
   Deploy airflow in k8s cluster and set "delete_worker_pods = False". once 
completed pod reaches 7,000 to 10,000, you will able to see this issue.
   
   ### Operating System
   
   OS:Debian GNU/Linux, VERSION: 10
   
   ### Versions of Apache Airflow Providers
   
   _No response_
   
   ### Deployment
   
   Other Docker-based deployment
   
   ### Deployment details
   
   _No response_
   
   ### Anything else
   
   _No response_
   
   ### Are you willing to submit PR?
   
   - [X] Yes I am willing to submit a PR!
   
   ### Code of Conduct
   
   - [X] I agree to follow this project's [Code of 
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to