paramjeet01 opened a new issue, #39267:
URL: https://github.com/apache/airflow/issues/39267
### Apache Airflow version
Other Airflow 2 version (please specify below)
### If "Other Airflow 2 version" selected, which one?
2.8.3
### What happened?
We are facing intermittent json error but on next retry it works.
```
[2024-04-26, 00:21:32 IST] {pod_manager.py:718} INFO - Checking if xcom
sidecar container is started.
[2024-04-26, 00:21:32 IST] {pod_manager.py:721} INFO - The xcom sidecar
container is started.
[2024-04-26, 00:21:32 IST] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-04-26, 00:21:36 IST] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-04-26, 00:21:40 IST] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-04-26, 00:21:44 IST] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-04-26, 00:21:52 IST] {pod_manager.py:798} INFO - Running command... if
[ -s /airflow/xcom/return.json ]; then cat /airflow/xcom/return.json; else echo
__airflow_xcom_result_empty__; fi
[2024-04-26, 00:21:52 IST] {pod_manager.py:798} INFO - Running command...
kill -s SIGINT 1
[2024-04-26, 00:21:52 IST] {pod.py:909} INFO - Deleting pod:
hpaev5-hevc-smp-manifest-generation-9lqygp1s
[2024-04-26, 00:21:52 IST] {taskinstance.py:2731} ERROR - Task failed with
exception
Traceback (most recent call last):
File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line
200, in execute
result = self.extract_xcom(pod=self.pod)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 557, in extract_xcom
result = self.pod_manager.extract_xcom(pod)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 730, in extract_xcom
result = self.extract_xcom_json(pod)
File
"/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line
289, in wrapped_f
return self(f, *args, **kw)
File
"/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line
379, in __call__
do = self.iter(retry_state=retry_state)
File
"/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line
325, in iter
raise retry_exc.reraise()
File
"/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line
158, in reraise
raise self.last_attempt.result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 451, in
result
return self.__get_result()
File "/usr/local/lib/python3.10/concurrent/futures/_base.py", line 403, in
__get_result
raise self._exception
File
"/home/airflow/.local/lib/python3.10/site-packages/tenacity/__init__.py", line
382, in __call__
result = fn(*args, **kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/utils/pod_manager.py",
line 765, in extract_xcom_json
json.loads(result)
File "/usr/local/lib/python3.10/json/__init__.py", line 346, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.10/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.10/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 16385 (char
16384)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
line 439, in _execute_task
result = _execute_callable(context=context, **execute_callable_kwargs)
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/models/taskinstance.py",
line 414, in _execute_callable
return execute_callable(context=context, **execute_callable_kwargs)
File "/opt/airflow/plugins/operators/kubernetes_pod_operator.py", line
215, in execute
self.cleanup(
File
"/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/cncf/kubernetes/operators/pod.py",
line 839, in cleanup
raise AirflowException
```
### What you think should happen instead?
The task should not fail when xcom data.
### How to reproduce
This can be reproduced by having a 20k char in json file for xcom and it'll
fail intermittently while taking the data.
I'll investigate on the reason for the xcom json issue.
### Operating System
Amazon Linux 2
### Versions of Apache Airflow Providers
pytest>=6.2.5
docker>=5.0.0
crypto>=1.4.1
cryptography>=3.4.7
pyOpenSSL>=20.0.1
ndg-httpsclient>=0.5.1
boto3>=1.34.0
sqlalchemy
redis>=3.5.3
requests>=2.26.0
pysftp>=0.2.9
werkzeug>=1.0.1
apache-airflow-providers-cncf-kubernetes==8.0.0
apache-airflow-providers-amazon>=8.13.0
psycopg2>=2.8.5
grpcio>=1.37.1
grpcio-tools>=1.37.1
protobuf>=3.15.8,<=3.21
python-dateutil>=2.8.2
jira>=3.1.1
confluent_kafka>=1.7.0
pyarrow>=10.0.1,<10.1.0
### Deployment
Official Apache Airflow Helm Chart
### Deployment details
Official helm chart deployment.
### Anything else?
I think , we are facing similar issue :
https://github.com/apache/airflow/issues/32111
And it's fixed here : https://github.com/apache/airflow/pull/32113/files ,
we might need to increase the retry count.
### Are you willing to submit PR?
- [ ] Yes I am willing to submit a PR!
### Code of Conduct
- [X] I agree to follow this project's [Code of
Conduct](https://github.com/apache/airflow/blob/main/CODE_OF_CONDUCT.md)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]