jedcunningham commented on code in PR #41186:
URL: https://github.com/apache/airflow/pull/41186#discussion_r1712552339
##########
airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py:
##########
@@ -220,6 +220,16 @@ def process_status(
# However, need to free the executor slot from the current
executor.
self.log.info("Event: pod %s adopted, annotations: %s", pod_name,
annotations_string)
self.watcher_queue.put((pod_name, namespace, ADOPTED, annotations,
resource_version))
+ elif hasattr(pod.status, "reason") and pod.status.reason ==
"ProviderFailed":
Review Comment:
The reason I ask is we tolerate a lot of possibly temporary non-runnable
pods already, e.g. if there are no nodes with space for the pod, and simply
rely on the queued timeout to handle them. If it can be temporary, giving the
pod a shot of getting going is probably the better behavior here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]