jherrmannNetfonds commented on PR #31798:
URL: https://github.com/apache/airflow/pull/31798#issuecomment-1594346774

   Hi, thanks for the PR.
   I am testing this in production right now with airflow 2.6.1 running on 
Kubernetes with KubernetesExecutor. I encountered this error today:
   
   ```
   [2023-06-16, 09:44:04 CEST] {taskinstance.py:1824} ERROR - Task failed with 
exception
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
761, in _update_chunk_length
       self.chunk_left = int(line, 16)
   ValueError: invalid literal for int() with base 16: b''
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
444, in _error_catcher
       yield
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
828, in read_chunked
       self._update_chunk_length()
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
765, in _update_chunk_length
       raise InvalidChunkLength(self, line)
   urllib3.exceptions.InvalidChunkLength: InvalidChunkLength(got length b'', 0 
bytes read)
   During handling of the above exception, another exception occurred:
   Traceback (most recent call last):
     File 
"/opt/airflow/dags/repo/packages/local_copy_of_this_pr/spark_kubernetes.py", 
line 122, in execute
       for line in pod_log_stream:
     File 
"/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", 
line 165, in stream
       for line in iter_resp_lines(resp):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/kubernetes/watch/watch.py", 
line 56, in iter_resp_lines
       for seg in resp.stream(amt=None, decode_content=False):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
624, in stream
       for line in self.read_chunked(amt, decode_content=decode_content):
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
816, in read_chunked
       with self._error_catcher():
     File "/usr/local/lib/python3.10/contextlib.py", line 153, in __exit__
       self.gen.throw(typ, value, traceback)
     File 
"/home/airflow/.local/lib/python3.10/site-packages/urllib3/response.py", line 
461, in _error_catcher
       raise ProtocolError("Connection broken: %r" % e, e)
   ```
   This fails the task, but the spark application is still running and 
succeeding. Maybe some of these types of errors should be catched instead of 
letting the pod fail. What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to