AronsonDan commented on issue #37090:
URL: https://github.com/apache/airflow/issues/37090#issuecomment-1997028959

   I see that issue quite a lot in our implementation.
   We experience that a lot when setting deferrable mode.
   
   I think in that case the pod was evicted either by Karpenter or spot 
instance was taken beck by AWS.
   But when deferrable mode was on, we saw that error quite a lot even though 
the pods were succesfull
   
   example log:
   ```python 
   [2024-03-13, 15:00:23 UTC] {pod_manager.py:607} INFO - Pod 
run-mongo-collection-sync-orchestrator-w5v994bv has phase Running
   [2024-03-13, 15:00:31 UTC] {taskinstance.py:2728} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 608, in execute_sync
       self.remote_pod = self.pod_manager.await_pod_completion(
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 602, in await_pod_completion
       remote_pod = self.read_pod(pod)
                    ^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
289, in wrapped_f
       return self(f, *args, **kw)
              ^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
379, in __call__
       do = self.iter(retry_state=retry_state)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
325, in iter
       raise retry_exc.reraise()
             ^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
158, in reraise
       raise self.last_attempt.result()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in 
result
       return self.__get_result()
              ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in 
__get_result
       raise self._exception
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
382, in __call__
       result = fn(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 704, in read_pod
       return self._client.read_namespaced_pod(pod.metadata.name, 
pod.metadata.namespace)
              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
 line 23483, in read_namespaced_pod
       return self.read_namespaced_pod_with_http_info(name, namespace, 
**kwargs)  # noqa: E501
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
 line 23570, in read_namespaced_pod_with_http_info
       return self.api_client.call_api(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
 line 348, in call_api
       return self.__call_api(resource_path, method,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
 line 180, in __call_api
       response_data = self.request(
                       ^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
 line 373, in request
       return self.rest_client.GET(url,
              ^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", 
line 240, in GET
       return self.request("GET", url,
              ^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", 
line 234, in request
       raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'26ba592d-8b61-4888-b9bf-5066232dd535', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 
'15741541-3633-4f53-9ae5-bcb4592de0f4', 'X-Kubernetes-Pf-Prioritylevel-Uid': 
'675c191c-7b76-46bc-b12f-d1820e389768', 'Date': 'Wed, 13 Mar 2024 15:00:28 
GMT', 'Content-Length': '262'})
   HTTP response body: 
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
 \"run-mongo-collection-sync-orchestrator-w5v994bv\" not 
found","reason":"NotFound","details":{"name":"run-mongo-collection-sync-orchestrator-w5v994bv","kind":"pods"},"code":404}
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 444, in _execute_task
       result = _execute_callable(context=context, **execute_callable_kwargs)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
 line 414, in _execute_callable
       return execute_callable(context=context, **execute_callable_kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 554, in execute
       return self.execute_sync(context)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 613, in execute_sync
       self.cleanup(
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 736, in cleanup
       istio_enabled = self.is_istio_enabled(remote_pod)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
 line 798, in is_istio_enabled
       remote_pod = self.pod_manager.read_pod(pod)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
289, in wrapped_f
       return self(f, *args, **kw)
              ^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
379, in __call__
       do = self.iter(retry_state=retry_state)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
325, in iter
       raise retry_exc.reraise()
             ^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
158, in reraise
       raise self.last_attempt.result()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in 
result
       return self.__get_result()
              ^^^^^^^^^^^^^^^^^^^
     File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in 
__get_result
       raise self._exception
     File 
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line 
382, in __call__
       result = fn(*args, **kwargs)
                ^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
 line 704, in read_pod
       return self._client.read_namespaced_pod(pod.metadata.name, 
pod.metadata.namespace)
              
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
 line 23483, in read_namespaced_pod
       return self.read_namespaced_pod_with_http_info(name, namespace, 
**kwargs)  # noqa: E501
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
 line 23570, in read_namespaced_pod_with_http_info
       return self.api_client.call_api(
              ^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
 line 348, in call_api
       return self.__call_api(resource_path, method,
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
 line 180, in __call_api
       response_data = self.request(
                       ^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
 line 373, in request
       return self.rest_client.GET(url,
              ^^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", 
line 240, in GET
       return self.request("GET", url,
              ^^^^^^^^^^^^^^^^^^^^^^^^
     File 
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py", 
line 234, in request
       raise ApiException(http_resp=r)
   kubernetes.client.exceptions.ApiException: (404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'd1bbf3e7-1823-4bd0-bb3c-b77af1cde651', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 
'15741541-3633-4f53-9ae5-bcb4592de0f4', 'X-Kubernetes-Pf-Prioritylevel-Uid': 
'675c191c-7b76-46bc-b12f-d1820e389768', 'Date': 'Wed, 13 Mar 2024 15:00:31 
GMT', 'Content-Length': '262'})
   HTTP response body: 
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
 \"run-mongo-collection-sync-orchestrator-w5v994bv\" not 
found","reason":"NotFound","details":{"name":"run-mongo-collection-sync-orchestrator-w5v994bv","kind":"pods"},"code":404}
   [2024-03-13, 15:00:31 UTC] {taskinstance.py:1149} INFO - Marking task as 
FAILED. dag_id=dynamic_run_mongo_collection_sync_orchestrator_dag, 
task_id=kubernetes_pod_operator, execution_date=20240313T144000, 
start_date=20240313T150016, end_date=20240313T150031
   [2024-03-13, 15:00:31 UTC] {logging_mixin.py:188} WARNING - 
/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/context.py:207 
AirflowContextDeprecationWarning: Accessing 'execution_date' from the template 
is deprecated and will be removed in a future version. Please use 
'data_interval_start' or 'logical_date' instead.
   [2024-03-13, 15:00:31 UTC] {base.py:83} INFO - Using connection ID 
'slack_default' for task execution.
   [2024-03-13, 15:00:31 UTC] {standard_task_runner.py:107} ERROR - Failed to 
execute job 158458 for task kubernetes_pod_operator ((404)
   Reason: Not Found
   HTTP response headers: HTTPHeaderDict({'Audit-Id': 
'd1bbf3e7-1823-4bd0-bb3c-b77af1cde651', 'Cache-Control': 'no-cache, private', 
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid': 
'15741541-3633-4f53-9ae5-bcb4592de0f4', 'X-Kubernetes-Pf-Prioritylevel-Uid': 
'675c191c-7b76-46bc-b12f-d1820e389768', 'Date': 'Wed, 13 Mar 2024 15:00:31 
GMT', 'Content-Length': '262'})
   HTTP response body: 
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
 \"run-mongo-collection-sync-orchestrator-w5v994bv\" not 
found","reason":"NotFound","details":{"name":"run-mongo-collection-sync-orchestrator-w5v994bv","kind":"pods"},"code":404}
   ; 20)
   [2024-03-13, 15:00:31 UTC] {local_task_job_runner.py:234} INFO - Task exited 
with return code 1
   [2024-03-13, 15:00:31 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks 
scheduled from follow-on schedule check
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to