Re: [PR] [KubernetesPodOperator] Dectection of different timeouts for schedule and startup state [airflow]

via GitHub Tue, 06 May 2025 12:48:04 -0700


jscheffl commented on code in PR #49784:
URL: https://github.com/apache/airflow/pull/49784#discussion_r2076147473



##########
providers/cncf/kubernetes/src/airflow/providers/cncf/kubernetes/operators/pod.py:
##########
@@ -157,8 +157,9 @@ class KubernetesPodOperator(BaseOperator):
     :param reattach_on_restart: if the worker dies while the pod is running, 
reattach and monitor
         during the next try. If False, always create a new pod for each try.
     :param labels: labels to apply to the Pod. (templated)
-    :param startup_timeout_seconds: timeout in seconds to startup the pod.
+    :param startup_timeout_seconds: timeout in seconds to startup the pod 
after pod was scheduled.

Review Comment:
   No, both timeouts as "stages" are measured independent in the proposal. Idea 
is that for example if you have a very busy node pool the scheduling can take a 
long time but after it is assigned to a node (=scheduled) then you would expect 
a shorter time to pull the image and get started. At least in our environment 
the scheduling can be taking longer time also as nodes might need to spin-up. 
But we have seen cases where nodes get stuck in starting the container and 
having a very long startup time then masks node problems.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [KubernetesPodOperator] Dectection of different timeouts for schedule and startup state [airflow]

Reply via email to