paramjeet01 opened a new issue, #37090:
URL: https://github.com/apache/airflow/issues/37090
### Apache Airflow Provider(s)
cncf-kubernetes
### Versions of Apache Airflow Providers
apache-airflow-providers-cncf-kubernetes==7.11.0
### Apache Airflow version
2.7.3
### Operating System
Amazon Linux 2
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### What happened
The pod ran successfully but the airflow is checking the cluster bit later
and couldn't find the pod since it was terminated after the process succeeded.
Now , Airflow thinks the pod is failed (not found ) retries the task
Error message :
> File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line
153, in execute
final_state, remote_pod, result =
self.create_new_pod_for_operator(labels, launcher)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py",
line 544, in create_new_pod_for_operator
self.patch_already_checked(self.pod)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py",
line 551, in patch_already_checked
self.client.patch_namespaced_pod(pod.metadata.name,
pod.metadata.namespace, body)
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 16004, in patch_namespaced_pod
(data) = self.patch_namespaced_pod_with_http_info(name, namespace, body,
**kwargs) # noqa: E501
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 16095, in patch_namespaced_pod_with_http_info
return self.api_client.call_api(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py",
line 340, in call_api
return self.__call_api(resource_path, method,
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py",
line 172, in __call_api
response_data = self.request(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py",
line 398, in request
return self.rest_client.PATCH(url,
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/rest.py",
line 292, in PATCH
return self.request("PATCH", url,
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/rest.py",
line 231, in request
raise ApiException(http_resp=r)
kubernetes.client.rest.ApiException: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Cache-Control': 'no-cache, private',
'Content-Type': 'application/json', 'Date': 'Mon, 29 Jan 2024 10:50:19 GMT',
'Content-Length': '262'})
HTTP response body:
{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
\"mypod-7499eea656f744bf97b7c1682e68eaac\" not
found","reason":"NotFound","details":{"name":"mypod-7499eea656f744bf97b7c1682e68eaac","kind":"pods"},"code":404}
### What you think should happen instead
Airflow should constantly check for pod status and update the task status.
Or the task will be retries when the pod is succeeded and terminated
### How to reproduce
Have a long running multiple tasks ~20 minutes , do xcom push , and let the
pod be destroyed by kubernetes. You'll notice airflow with kubernetes api
throws 404 not found.
### Anything else
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]