jherrmannNetfonds commented on PR #31798:
URL: https://github.com/apache/airflow/pull/31798#issuecomment-1594346774
Hi, thanks for the PR.
I am testing this in production right now with airflow 2.6.1 running on
Kubernetes with KubernetesExecutor. I encountered this error today:
```
[2023-06-16, 09:44:04 CEST] {taskinstance.py:1824} ERROR - Task failed with
exception
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line
761, in _update_chunk_length
self.chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line
444, in _error_catcher
yield
File
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line
828, in read_chunked
self._update_chunk_length()
File
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line
765, in _update_chunk_length
raise InvalidChunkLength(self, line)
urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0
bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/opt/airflow/dags/repo/packages/local_copy_of_this_pr/spark_kubernetes.py",
line 122, in execute
for line in pod_log_stream:
File
"/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py",
line 165, in stream
for line in iter_resp_lines(resp):
File
"/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py",
line 56, in iter_resp_lines
for seg in resp.stream(amt=None, decode_content=False):
File
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line
624, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line
816, in read_chunked
with self._error_catcher():
File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
self.gen.throw(typ, value, traceback)
File
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line
461, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
```
This fails the task, but the spark application is still running and
succeeding. Maybe some of these types of errors should be catched instead of
letting the pod fail. What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]