Re: [PR] Fix kpo log_events_on_failure logs warnings at warning level [airflow]

via GitHub Fri, 29 Aug 2025 00:53:35 -0700


ketozhang commented on PR #54967:
URL: https://github.com/apache/airflow/pull/54967#issuecomment-3236093089


   Let's say it this way. What is the real error message in this output of 
KuberntesPodOperator task?
   
   ```
   ...
   [2025-08-15, 18:16:49 UTC] {pod.py:1027} ERROR - Pod Event: FailedMount - 
MountVolume.MountDevice failed for volume 
"pvc-229d4d89-eb1e-45da-be2f-aa50d0799350" : rpc error: code = Internal desc = 
Failed to find device path /dev/xvdae. no device path for device "/dev/xvdae" 
volume "vol-0a93da870e4b7192e" found
   [2025-08-15, 18:16:49 UTC] {pod.py:1027} ERROR - Pod Event: 
FailedCreatePodSandBox - Failed to create pod sandbox: rpc error: code = 
DeadlineExceeded desc = context deadline exceeded
   [2025-08-15, 18:16:50 UTC] {pod.py:1025} INFO - Pod Event: Scheduled - 
Successfully assigned airflow/redacted to redacted.ec2.internal
   [2025-08-15, 18:16:50 UTC] {pod.py:1025} INFO - Pod Event: 
SuccessfulAttachVolume - AttachVolume.Attach succeeded for volume 
"pvc-229d4d89-eb1e-45da-be2f-aa50d0799350"
   [2025-08-15, 18:16:50 UTC] {pod.py:1027} ERROR - Pod Event: FailedMount - 
MountVolume.MountDevice failed for volume 
"pvc-229d4d89-eb1e-45da-be2f-aa50d0799350" : rpc error: code = Internal desc = 
Failed to find device path /dev/xvdae. no device path for device "/dev/xvdae" 
volume "vol-0a93da870e4b7192e" found
   [2025-08-15, 18:16:50 UTC] {pod.py:1027} ERROR - Pod Event: 
FailedCreatePodSandBox - Failed to create pod sandbox: rpc error: code = 
DeadlineExceeded desc = context deadline exceeded
   [2025-08-15, 18:16:50 UTC] {pod.py:1076} INFO - Deleting pod: redacted
   [2025-08-15, 18:16:51 UTC] {taskinstance.py:3336} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 647, in execute_sync
       self.await_pod_start(pod=self.pod)
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 582, in await_pod_start
       self.pod_manager.await_pod_start(
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 419, in await_pod_start
       raise PodLaunchFailedException(
   
airflow.providers.cncf.kubernetes.utils.pod_manager.PodLaunchFailedException: 
Pod took too long to start. More than 300s. Check the pod events in kubernetes.
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py",
 line 776, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py",
 line 742, in _execute_callable
   ...
   ```
   
   I have many many devs confused and report to DevOps saying the K8s cluster 
is broken since they're getting pod event errors like FailedMount, 
FailedScheduling, etc. however if their KPO timeout was longer, the cluster 
would've auto-solved it. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Re: [PR] Fix kpo log_events_on_failure logs warnings at warning level [airflow]

Reply via email to