ketozhang commented on PR #54967: URL: https://github.com/apache/airflow/pull/54967#issuecomment-3236093089
Let's say it this way. What is the real error message in this output of KuberntesPodOperator task? ``` ... [2025-08-15, 18:16:49 UTC] {pod.py:1027} ERROR - Pod Event: FailedMount - MountVolume.MountDevice failed for volume "pvc-229d4d89-eb1e-45da-be2f-aa50d0799350" : rpc error: code = Internal desc = Failed to find device path /dev/xvdae. no device path for device "/dev/xvdae" volume "vol-0a93da870e4b7192e" found [2025-08-15, 18:16:49 UTC] {pod.py:1027} ERROR - Pod Event: FailedCreatePodSandBox - Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded [2025-08-15, 18:16:50 UTC] {pod.py:1025} INFO - Pod Event: Scheduled - Successfully assigned airflow/redacted to redacted.ec2.internal [2025-08-15, 18:16:50 UTC] {pod.py:1025} INFO - Pod Event: SuccessfulAttachVolume - AttachVolume.Attach succeeded for volume "pvc-229d4d89-eb1e-45da-be2f-aa50d0799350" [2025-08-15, 18:16:50 UTC] {pod.py:1027} ERROR - Pod Event: FailedMount - MountVolume.MountDevice failed for volume "pvc-229d4d89-eb1e-45da-be2f-aa50d0799350" : rpc error: code = Internal desc = Failed to find device path /dev/xvdae. no device path for device "/dev/xvdae" volume "vol-0a93da870e4b7192e" found [2025-08-15, 18:16:50 UTC] {pod.py:1027} ERROR - Pod Event: FailedCreatePodSandBox - Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = context deadline exceeded [2025-08-15, 18:16:50 UTC] {pod.py:1076} INFO - Deleting pod: redacted [2025-08-15, 18:16:51 UTC] {taskinstance.py:3336} ERROR - Task failed with exception Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 647, in execute_sync self.await_pod_start(pod=self.pod) File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py", line 582, in await_pod_start self.pod_manager.await_pod_start( File "/home/airflow/.local/lib/python3.12/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py", line 419, in await_pod_start raise PodLaunchFailedException( airflow.providers.cncf.kubernetes.utils.pod_manager.PodLaunchFailedException: Pod took too long to start. More than 300s. Check the pod events in kubernetes. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 776, in _execute_task result = _execute_callable(context=context, **execute_callable_kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/airflow/.local/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 742, in _execute_callable ... ``` I have many many devs confused and report to DevOps saying the K8s cluster is broken since they're getting pod event errors like FailedMount, FailedScheduling, etc. however if their KPO timeout was longer, the cluster would've auto-solved it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org