jameslamb commented on a change in pull request #17953:
URL: https://github.com/apache/airflow/pull/17953#discussion_r725575828
##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -128,7 +128,12 @@ def start_pod(self, pod: V1Pod, startup_timeout: int =
120):
self.log.warning("Pod not yet started: %s", pod.metadata.name)
delta = dt.now() - curr_time
if delta.total_seconds() >= startup_timeout:
- raise AirflowException("Pod took too long to start")
+ msg = (
+ f"Pod took longer than {startup_timeout} seconds to
start. "
+ "Increasing 'startup_timeout' might resolve this
error, but check the pod events in kubernetes "
+ "for structural errors like a missing imagePullSecret."
Review comment:
Sure! I've proposed some new language in
https://github.com/apache/airflow/pull/17953/commits/a78f858fc73f426d1205025583e694ef898adc1d.
I think another reference to timeouts (like "didn't start quickly enough")
should be avoided. My goal with this pull request is to make it clearer that
while this error is triggered by a timeout, it can be observed in situations
where configuration or permission issues mean that the pod will NEVER start
correctly.
I opened this based on my personal experience using `KubernetesPodOperator`,
where I didn't realize I had a typo in the name of an `imagePullSecret`,
observed a task failing with this error about timeouts, and spent some time
testing higher values of `startup_timeout` trying to resolve the issue.
In my experience so far with `KubernetesPodOperator`, such k8s configuration
errors are common and I hope that this error message will save others some time
by making it clear that they should check k8s errors before just trying to
increase the timeout.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]