dirrao commented on issue #22612: URL: https://github.com/apache/airflow/issues/22612#issuecomment-2089951478
We are seeing this issue in the airflow version 2.3.3. I strongly believe the issue is there in the latest airflow version 2.9.1 as well as per the latest code. I don't see any improvements in watcher performance. The primary reason for this issue is due to the Kubernetes pod watcher is not fast enough to cope with the Kubernetes events rate. This leads to Kubernetes watcher failure/restart and adopt_complete_pods take over the completed pods. The adopt_complete_pods will take a couple of minutes, causing the scheduler delayed heartbeat, and then scheduler liveness failures, and then scheduler pod restart. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
