stoiandl edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202


   I managed to fix my restart by setting up the following configs:
   ```
   [kubernetes]
   ...
   kube_client_request_args = { "_request_timeout": 60 }
   delete_option_kwargs = {"grace_period_seconds": 10}
   enable_tcp_keepalive = True
   tcp_keep_idle = 30
   tcp_keep_intvl = 30
   tcp_keep_cnt = 30
   ```
   
   I have another Airflow instance running in AWS - Kubernetes. That one runs 
fine with any version,
   
   I realized the problem is with Azure Kubernetes, the rest api calls to the 
api server. Now, I can see in the scheduler logs, some kubernetes connection 
errors but those are not killers since airflow tries to reconnect and it is 
successful the second time. 
   
   Now my logs are full of connection errors:
   ```
   [2021-02-24 00:44:06,188] {kubernetes_executor.py:106} WARNING - There was a 
timeout error accessing the Kube API. Retrying request.
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
 line 313, in recv_into
       return self.connection.recv_into(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", 
line 1840, in recv_into
       self._raise_ssl_error(self._ssl, result)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", 
line 1646, in _raise_ssl_error
       raise WantReadError()
   OpenSSL.SSL.WantReadError
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
436, in _error_catcher
       yield
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
763, in read_chunked
       self._update_chunk_length()
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
693, in _update_chunk_length
       line = self._fp.fp.readline()
     File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
       return self._sock.recv_into(b)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
 line 326, in recv_into
       raise timeout("The read operation timed out")
   socket.timeout: The read operation timed out
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
 line 102, in run
       self.resource_version = self._run(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
 line 145, in _run
       for event in list_worker_pods():
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", 
line 144, in stream
       for line in iter_resp_lines(resp):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", 
line 46, in iter_resp_lines
       for seg in resp.read_chunked(decode_content=False):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
792, in read_chunked
       self._original_response.close()
     File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
       self.gen.throw(type, value, traceback)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
441, in _error_catcher
       raise ReadTimeoutError(self._pool, None, "Read timed out.")
   urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', 
port=443): Read timed out.
   [2021-02-24 00:44:07,330] {kubernetes_executor.py:126} INFO - Event: and now 
my watch begins starting at resource_version: 0
   [2021-02-24 00:45:07,394] {kubernetes_executor.py:106} WARNING - There was a 
timeout error accessing the Kube API. Retrying request.
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
 line 313, in recv_into
       return self.connection.recv_into(*args, **kwargs)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", 
line 1840, in recv_into
       self._raise_ssl_error(self._ssl, result)
     File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py", 
line 1646, in _raise_ssl_error
       raise WantReadError()
   OpenSSL.SSL.WantReadError
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
436, in _error_catcher
       yield
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
763, in read_chunked
       self._update_chunk_length()
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
693, in _update_chunk_length
       line = self._fp.fp.readline()
     File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
       return self._sock.recv_into(b)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
 line 326, in recv_into
       raise timeout("The read operation timed out")
   socket.timeout: The read operation timed out
   
   During handling of the above exception, another exception occurred:
   
   Traceback (most recent call last):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
 line 102, in run
       self.resource_version = self._run(
     File 
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
 line 145, in _run
       for event in list_worker_pods():
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", 
line 144, in stream
       for line in iter_resp_lines(resp):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py", 
line 46, in iter_resp_lines
       for seg in resp.read_chunked(decode_content=False):
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
792, in read_chunked
       self._original_response.close()
     File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
       self.gen.throw(type, value, traceback)
     File 
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line 
441, in _error_catcher
       raise ReadTimeoutError(self._pool, None, "Read timed out.")
   urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1', 
port=443): Read timed out.
   ```
   
   It is not the prettiest solution but it works.... Definitely needs to be 
addresses somehow.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to