dmateusp commented on issue #12136:
URL: https://github.com/apache/airflow/issues/12136#issuecomment-754002352


   I was able to reproduce this issue after a few days of letting Bash 
containers run from KubernetesPodOperator, and implement a fix:
   
   I ran two bash tasks, one with the fix (EarnestKubernetesPodOperator) and 
one without (KubernetesPodOperator):
   
   ```python
       bash_echo_1 = KubernetesPodOperator(
           task_id="bash_echo_1",
           image="bash:4.4",
           name="bash-echo-1",
           cmds=[
               "bash",
               "-c",
               "while true; do sleep 5; done;",
           ],
           do_xcom_push=True,
           get_logs=True,
           **t.DEFAULT_POD_ARGS,
       )
   
       bash_echo_2 = EarnestKubernetesPodOperator(
           task_id="bash_echo_2",
           name="bash-echo-2",
           image="bash:4.4",
           cmds=["bash", "-c", "while true; do sleep 5; done;",],
           get_logs=True,
           **t.DEFAULT_POD_ARGS,
       )
   ```
   
   I let both of these run for a few days and the one without the fix failed a 
few times while the one with the fix never failed:
   
   <img width="95" alt="Screen Shot 2021-01-04 at 14 07 50" 
src="https://user-images.githubusercontent.com/8832050/103543765-f3f65480-4e96-11eb-9b7d-05f71a9f1036.png";>
   
   ^ First row is `bash_echo_1` and second row is `bash_echo_2`
   
   In the logs of `bash_echo_1` we can see that the issue was reproduced after 
3 hours of running:
   
   ```
   [2020-12-31 03:00:36,636] {logging_mixin.py:112} INFO - [2020-12-31 
03:00:36,635] {pod_launcher.py:142} INFO - Event: bash-echo-1-3a123499 had an 
event of type Running
   [2021-01-01 06:12:42,517] {taskinstance.py:1145} ERROR - ('Connection 
broken: IncompleteRead(0 bytes read)', IncompleteRead(0 bytes read))
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
685, in _update_chunk_length
       self.chunk_left = int(line, 16)
   ValueError: invalid literal for int() with base 16: b''
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
425, in _error_catcher
       yield
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
752, in read_chunked
       self._update_chunk_length()
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
689, in _update_chunk_length
       raise httplib.IncompleteRead(line)
   http.client.IncompleteRead: IncompleteRead(0 bytes read)
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/models/taskinstance.py",
 line 983, in _run_raw_task
       result = task_copy.execute(context=context)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/contrib/operators/kubernetes_pod_operator.py",
 line 251, in execute
       get_logs=self.get_logs)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/contrib/kubernetes/pod_launcher.py",
 line 117, in run_pod
       return self._monitor_pod(pod, get_logs)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/airflow/contrib/kubernetes/pod_launcher.py",
 line 124, in _monitor_pod
       for line in logs:
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
796, in __iter__
       for chunk in self.stream(decode_content=True):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
560, in stream
       for line in self.read_chunked(amt, decode_content=decode_content):
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
781, in read_chunked
       self._original_response.close()
     File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
       self.gen.throw(type, value, traceback)
     File 
"/home/airflow/.local/lib/python3.7/site-packages/urllib3/response.py", line 
443, in _error_catcher
       raise ProtocolError("Connection broken: %r" % e, e)
   urllib3.exceptions.ProtocolError: ('Connection broken: IncompleteRead(0 
bytes read)', IncompleteRead(0 bytes read))
   ```
   
   While the same issue happened in `bash_echo_2` but it was handled by the 
change in this PR
   
   ```
   [2020-12-31 03:00:36,592] {logging_mixin.py:112} INFO - [2020-12-31 
03:00:36,592] {kubernetes_pod_operator.py:270} INFO - Event: 
bash-echo-2-72839c01adc84a29b3903527d6e6dfba had an event of type Running
   [2021-01-01 06:20:38,083] {logging_mixin.py:112} INFO - [2021-01-01 
06:20:38,082] {kubernetes_pod_operator.py:248} INFO - The pod has not logged 
since the logs were last fetched
   [2021-01-01 06:20:38,373] {logging_mixin.py:112} INFO - [2021-01-01 
06:20:38,372] {kubernetes_pod_operator.py:270} INFO - Event: 
bash-echo-2-72839c01adc84a29b3903527d6e6dfba had an event of type Running
   ```
   
   The only change was to add error handling around `self.read_pod_logs` in the 
KubernetesPodOperator:
   
   ```python
               try:
                   logs = self.read_pod_logs(pod)
                   for line in logs:
                       self.log.info(line)
               except ProtocolError as pe:
                   _, protocol_exception = pe.args
                   # When no logs are fetched, an IncompleteRead is thrown 
trying
                   # to decode the stream
                   if str(protocol_exception) == "IncompleteRead(0 bytes read)":
                       self.log.info(
                           "The pod has not logged since the logs were last 
fetched"
                       )
                   # If the exception is not about an empty stream we raise it
                   else:
                       raise e
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to