jameslamb opened a new pull request #17953:
URL: https://github.com/apache/airflow/pull/17953


   Thanks very much for all the work that has gone into v2 of 
`KubernetesPodOperator`!
   
   This PR proposes two small changes (one for documentation, one to a log 
message in `PodLauncher`), which I think might help others in debugging task 
failures when using this operator.
   
   ## Description
   
   In my recent experience with `KubernetesPodOperator` (using `airflow` 
2.1.0), I've found that for some classes of issues which cause a task to fail, 
it can be difficult to diagnose them from only the information in the Airflow 
UI.
   
   For problems where kubernetes is able to create a pod but one or more of its 
containers fails to start, I've found that the task logs in the Airflow UI look 
something like this:
   
   ```text
   [WARN] Pod not yet started: some-pod-amhk4t
   [WARN] Pod not yet started: some-pod-amhk4t
   [WARN] Pod not yet started: some-pod-amhk4t
   ...
   ...
   packages/airflow/providers/cncf/kubernetes/utils/pod_launcher.py", line 131, 
in start_pod
       raise AirflowException("Pod took too long to start")
   airflow.exceptions.AirflowException: Pod took too long to start
   ```
   
   My first thought seeing that log message was "oh ok maybe image pulling is 
taking a while and I just need to set a higher timeout". My second thought was 
"ok..'too long' according to what configuration?".
   
   I've found that the Airflow task logs can look like that for any of the 
following issues:
   
   * referencing a secret that doesn't exist in the target namespace
   * requesting an `image` that your pod isn't authorized to pull (e.g., it's 
in a private repository and you failed to specify `imagePullSecrets`)
   * referencing a volume that does not exist
   
   In these cases, diagnosing the issue requires going directly to kubernetes 
to get the pod events.
   
   I hope that the changes in this PR might save others some debugging time in 
the future.
   
   Thanks very much for your time and consideration.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to