hterik opened a new pull request, #26778: URL: https://github.com/apache/airflow/pull/26778
After the scheduler has launched many pods, it keeps trying to re-adopt them by patching every pod. Each patch-operation involves a remote API-call which can be be very slow. In the meantime the scheduler can not do anything else. By ignoring the pods that already have the expected label, the list query-result will be shorter and the number of patch-queries much less. We had an unlucky moment in our environment, where each patch-operation started taking 100ms each, with 200 pods in flight it accumulates into 20 seconds of blocked scheduler. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
