jedcunningham commented on a change in pull request #17953:
URL: https://github.com/apache/airflow/pull/17953#discussion_r727274888
##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -128,7 +128,12 @@ def start_pod(self, pod: V1Pod, startup_timeout: int =
120):
self.log.warning("Pod not yet started: %s", pod.metadata.name)
delta = dt.now() - curr_time
if delta.total_seconds() >= startup_timeout:
- raise AirflowException("Pod took too long to start")
+ msg = (
+ f"Pod took longer than {startup_timeout} seconds to
start. "
+ "Increasing 'startup_timeout' might resolve this
error, but check the pod events in kubernetes "
+ "for structural errors like a missing imagePullSecret."
Review comment:
I see where you are going but as an example, just last week I saw a case
where no nodes had enough free resources. There are a ton of reasons why the
timeout could be missed, I'm not sure singling out structural errors makes
sense to me. I'd rather have no examples in every single exception and just
point people to the events. I hear you on the double timeouts reference though.
Maybe this instead? Short and sweet:
```
f"Pod took longer than {startup_timeout} seconds to start (startup_timeout)."
" Check the pod events in kubernetes to determine why."
```
Another factor here is that the events should also be logged in the task log
(in theory). Did this not happen for you? Feels a little weird to tell folks to
"go look at k8s pod events" when it should be logged alongside the same message
🤷♂️
##########
File path: airflow/providers/cncf/kubernetes/utils/pod_launcher.py
##########
@@ -128,7 +128,12 @@ def start_pod(self, pod: V1Pod, startup_timeout: int =
120):
self.log.warning("Pod not yet started: %s", pod.metadata.name)
delta = dt.now() - curr_time
if delta.total_seconds() >= startup_timeout:
- raise AirflowException("Pod took too long to start")
+ msg = (
+ f"Pod took longer than {startup_timeout} seconds to
start. "
+ "Increasing 'startup_timeout' might resolve this
error, but check the pod events in kubernetes "
+ "for structural errors like a missing imagePullSecret."
Review comment:
> Logging events is turned off by default
Ah, well that would do it 🙃
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]