paramjeet01 opened a new issue, #38288:
URL: https://github.com/apache/airflow/issues/38288
### Apache Airflow version
Other Airflow 2 version (please specify below)
### If "Other Airflow 2 version" selected, which one?
2.7.3
### What happened?
Random kubernetes api exception errors are thrown in airflow scheduler :
```
[2024-03-19T14:05:19.836+0000] {kubernetes_executor.py:239} INFO - Found 0
queued task instances
[2024-03-19T14:05:23.984+0000] {kubernetes_executor_utils.py:121} ERROR -
Unknown error in KubernetesJobWatcher. Failing
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 112, in run
self.resource_version = self._run(
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/executors/kubernetes_executor_utils.py",
line 168, in _run
for event in self._pod_events(kube_client=kube_client,
query_kwargs=kwargs):
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/watch/watch.py",
line 195, in stream
raise client.rest.ApiException(
kubernetes.client.exceptions.ApiException: (410)
```
The task succeeded but it fails with an error :
```
[2024-03-19, 14:00:57 UTC] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-03-19, 14:01:42 UTC] {taskinstance.py:1937} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/stream/ws_client.py",
line 523, in websocket_call
client = WSClient(configuration, url, headers, capture_all)
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/stream/ws_client.py",
line 65, in __init__
self.sock = create_websocket(configuration, url, headers)
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/stream/ws_client.py",
line 489, in create_websocket
websocket.connect(url, **connect_opt)
File
"/home/airflow/.local/lib/python3.9/site-packages/websocket/_core.py", line
255, in connect
self.handshake_response = handshake(self.sock, url, *addrs, **options)
File
"/home/airflow/.local/lib/python3.9/site-packages/websocket/_handshake.py",
line 57, in handshake
status, resp = _get_resp_headers(sock)
File
"/home/airflow/.local/lib/python3.9/site-packages/websocket/_handshake.py",
line 150, in _get_resp_headers
raise WebSocketBadStatusException("Handshake status {status} {message}
-+-+- {headers} -+-+- {body}".format(status=status, message=status_message,
headers=resp_headers, body=response_body), status, status_message,
resp_headers, response_body)
websocket._exceptions.WebSocketBadStatusException: Handshake status 404 Not
Found -+-+- {'audit-id': 'f06dc16c-c88c-41fb-8bea-1c993d4c0ef4',
'cache-control': 'no-cache, private', 'content-type': 'application/json',
'date': 'Tue, 19 Mar 2024 14:01:19 GMT', 'content-length': '214'} -+-+-
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
\\"download-parse-0833b34s\\" not
found","reason":"NotFound","details":{"name":"download-parse-0833b34s","kind":"pods"},"code":404}\n'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 730, in extract_xcom
result = self.extract_xcom_json(pod)
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
289, in wrapped_f
return self(f, *args, **kw)
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
379, in __call__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
325, in iter
raise retry_exc.reraise()
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
158, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in
result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
382, in __call__
result = fn(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 743, in extract_xcom_json
kubernetes_stream(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/stream/stream.py",
line 35, in _websocket_request
return api_method(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 994, in connect_get_namespaced_pod_exec
return self.connect_get_namespaced_pod_exec_with_http_info(name,
namespace, **kwargs) # noqa: E501
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 1101, in connect_get_namespaced_pod_exec_with_http_info
return self.api_client.call_api(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py",
line 348, in call_api
return self.__call_api(resource_path, method,
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py",
line 180, in __call_api
response_data = self.request(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/stream/ws_client.py",
line 529, in websocket_call
raise ApiException(status=0, reason=str(e))
kubernetes.client.exceptions.ApiException: (0)
Reason: Handshake status 404 Not Found -+-+- {'audit-id':
'f06dc16c-c88c-41fb-8bea-1c993d4c0ef4', 'cache-control': 'no-cache, private',
'content-type': 'application/json', 'date': 'Tue, 19 Mar 2024 14:01:19 GMT',
'content-length': '214'} -+-+-
b'{"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"pods
\\"download-parse-0833b34s\\" not
found","reason":"NotFound","details":{"name":"download-parse-0833b34s","kind":"pods"},"code":404}\n'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line
207, in execute
result = self.extract_xcom(pod=self.pod)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 557, in extract_xcom
result = self.pod_manager.extract_xcom(pod)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 733, in extract_xcom
self.extract_xcom_kill(pod)
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
289, in wrapped_f
return self(f, *args, **kw)
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
379, in __call__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
325, in iter
raise retry_exc.reraise()
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
158, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 439, in
result
return self.__get_result()
File "/usr/local/lib/python3.9/concurrent/futures/_base.py", line 391, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.9/site-packages/tenacity/__init__.py", line
382, in __call__
result = fn(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.9/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 779, in extract_xcom_kill
kubernetes_stream(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/stream/stream.py",
line 35, in _websocket_request
return api_method(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 994, in connect_get_namespaced_pod_exec
return self.connect_get_namespaced_pod_exec_with_http_info(name,
namespace, **kwargs) # noqa: E501
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api/core_v1_api.py",
line 1101, in connect_get_namespaced_pod_exec_with_http_info
return self.api_client.call_api(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py",
line 348, in call_api
return self.__call_api(resource_path, method,
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/client/api_client.py",
line 180, in __call_api
response_data = self.request(
File
"/home/airflow/.local/lib/python3.9/site-packages/kubernetes/stream/ws_client.py",
line 529, in websocket_call
raise ApiException(status=0, reason=str(e)
```
### What you think should happen instead?
The task should be marked as successful
### How to reproduce
Run a task with xcom side car
### Operating System
Amazon Linux 2
### Versions of Apache Airflow Providers
_No response_
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
_No response_
### Anything else?
_No response_
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]