AronsonDan commented on issue #37090:
URL: https://github.com/apache/airflow/issues/37090#issuecomment-1997028959
I see that issue quite a lot in our implementation.
We experience that a lot when setting deferrable mode.
I think in that case the pod was evicted either by Karpenter or spot
instance was taken beck by AWS.
But when deferrable mode was on, we saw that error quite a lot even though
the pods were succesfull
example log:
```python
[2024-03-13, 15:00:23 UTC] {pod_manager.py:607} INFO - Pod
run-mongo-collection-sync-orchestrator-w5v994bv has phase Running
[2024-03-13, 15:00:31 UTC] {taskinstance.py:2728} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 608, in execute_sync
self.remote_pod = self.pod_manager.await_pod_completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 602, in await_pod_completion
remote_pod = self.read_pod(pod)
^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
289, in wrapped_f
return self(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
379, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
325, in iter
raise retry_exc.reraise()
^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
158, in reraise
raise self.last_attempt.result()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in
result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
382, in __call__
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 704, in read_pod
return self._client.read_namespaced_pod(pod.metadata.name,
pod.metadata.namespace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
line 23483, in read_namespaced_pod
return self.read_namespaced_pod_with_http_info(name, namespace,
**kwargs) # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
line 23570, in read_namespaced_pod_with_http_info
return self.api_client.call_api(
^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
line 348, in call_api
return self.__call_api(resource_path, method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
line 180, in __call_api
response_data = self.request(
^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
line 373, in request
return self.rest_client.GET(url,
^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py",
line 240, in GET
return self.request("GET", url,
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py",
line 234, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'26ba592d-8b61-4888-b9bf-5066232dd535', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':
'15741541-3633-4f53-9ae5-bcb4592de0f4', 'X-Kubernetes-Pf-Prioritylevel-Uid':
'675c191c-7b76-46bc-b12f-d1820e389768', 'Date': 'Wed, 13 Mar 2024 15:00:28
GMT', 'Content-Length': '262'})
HTTP response body:
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
\"run-mongo-collection-sync-orchestrator-w5v994bv\" not
found","reason":"NotFound","details":{"name":"run-mongo-collection-sync-orchestrator-w5v994bv","kind":"pods"},"code":404}
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
line 444, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/models/taskinstance.py",
line 414, in _execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 554, in execute
return self.execute_sync(context)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 613, in execute_sync
self.cleanup(
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 736, in cleanup
istio_enabled = self.is_istio_enabled(remote_pod)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 798, in is_istio_enabled
remote_pod = self.pod_manager.read_pod(pod)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
289, in wrapped_f
return self(f, *args, **kw)
^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
379, in __call__
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
325, in iter
raise retry_exc.reraise()
^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
158, in reraise
raise self.last_attempt.result()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 449, in
result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/_base.py", line 401, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.11/site-packages/tenacity/__init__.py", line
382, in __call__
result = fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 704, in read_pod
return self._client.read_namespaced_pod(pod.metadata.name,
pod.metadata.namespace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
line 23483, in read_namespaced_pod
return self.read_namespaced_pod_with_http_info(name, namespace,
**kwargs) # noqa: E501
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api/core_v1_api.py",
line 23570, in read_namespaced_pod_with_http_info
return self.api_client.call_api(
^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
line 348, in call_api
return self.__call_api(resource_path, method,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
line 180, in __call_api
response_data = self.request(
^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/api_client.py",
line 373, in request
return self.rest_client.GET(url,
^^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py",
line 240, in GET
return self.request("GET", url,
^^^^^^^^^^^^^^^^^^^^^^^^
File
"/home/airflow/.local/lib/python3.11/site-packages/kubernetes/client/rest.py",
line 234, in request
raise ApiException(http_resp=r)
kubernetes.client.exceptions.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'd1bbf3e7-1823-4bd0-bb3c-b77af1cde651', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':
'15741541-3633-4f53-9ae5-bcb4592de0f4', 'X-Kubernetes-Pf-Prioritylevel-Uid':
'675c191c-7b76-46bc-b12f-d1820e389768', 'Date': 'Wed, 13 Mar 2024 15:00:31
GMT', 'Content-Length': '262'})
HTTP response body:
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
\"run-mongo-collection-sync-orchestrator-w5v994bv\" not
found","reason":"NotFound","details":{"name":"run-mongo-collection-sync-orchestrator-w5v994bv","kind":"pods"},"code":404}
[2024-03-13, 15:00:31 UTC] {taskinstance.py:1149} INFO - Marking task as
FAILED. dag_id=dynamic_run_mongo_collection_sync_orchestrator_dag,
task_id=kubernetes_pod_operator, execution_date=20240313T144000,
start_date=20240313T150016, end_date=20240313T150031
[2024-03-13, 15:00:31 UTC] {logging_mixin.py:188} WARNING -
/home/airflow/.local/lib/python3.11/site-packages/airflow/utils/context.py:207
AirflowContextDeprecationWarning: Accessing 'execution_date' from the template
is deprecated and will be removed in a future version. Please use
'data_interval_start' or 'logical_date' instead.
[2024-03-13, 15:00:31 UTC] {base.py:83} INFO - Using connection ID
'slack_default' for task execution.
[2024-03-13, 15:00:31 UTC] {standard_task_runner.py:107} ERROR - Failed to
execute job 158458 for task kubernetes_pod_operator ((404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id':
'd1bbf3e7-1823-4bd0-bb3c-b77af1cde651', 'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'X-Kubernetes-Pf-Flowschema-Uid':
'15741541-3633-4f53-9ae5-bcb4592de0f4', 'X-Kubernetes-Pf-Prioritylevel-Uid':
'675c191c-7b76-46bc-b12f-d1820e389768', 'Date': 'Wed, 13 Mar 2024 15:00:31
GMT', 'Content-Length': '262'})
HTTP response body:
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
\"run-mongo-collection-sync-orchestrator-w5v994bv\" not
found","reason":"NotFound","details":{"name":"run-mongo-collection-sync-orchestrator-w5v994bv","kind":"pods"},"code":404}
; 20)
[2024-03-13, 15:00:31 UTC] {local_task_job_runner.py:234} INFO - Task exited
with return code 1
[2024-03-13, 15:00:31 UTC] {taskinstance.py:3309} INFO - 0 downstream tasks
scheduled from follow-on schedule check
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]