stoiandl edited a comment on issue #14261:
URL: https://github.com/apache/airflow/issues/14261#issuecomment-784619202
I managed to fix my restart by setting up the following configs:
```
[kubernetes]
...
kube_client_request_args = { "_request_timeout": 60 }
delete_option_kwargs = {"grace_period_seconds": 10}
enable_tcp_keepalive = True
tcp_keep_idle = 30
tcp_keep_intvl = 30
tcp_keep_cnt = 30
```
I have another Airflow instance running in AWS - Kubernetes. That one runs
fine with any version,
I realized the problem is with Azure Kubernetes, the rest api calls to the
api server. Now, I can see in the scheduler logs, some kubernetes connection
errors but those are not killers since airflow tries to reconnect and it is
successful the second time.
Now my logs are full of connection errors:
```
[2021-02-24 00:44:06,188] {kubernetes_executor.py:106} WARNING - There was a
timeout error accessing the Kube API. Retrying request.
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
line 313, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py",
line 1840, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py",
line 1646, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
436, in _error_catcher
yield
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
763, in read_chunked
self._update_chunk_length()
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
693, in _update_chunk_length
line = self._fp.fp.readline()
File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
line 326, in recv_into
raise timeout("The read operation timed out")
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
line 102, in run
self.resource_version = self._run(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
line 145, in _run
for event in list_worker_pods():
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py",
line 144, in stream
for line in iter_resp_lines(resp):
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py",
line 46, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
441, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1',
port=443): Read timed out.
[2021-02-24 00:44:07,330] {kubernetes_executor.py:126} INFO - Event: and now
my watch begins starting at resource_version: 0
[2021-02-24 00:45:07,394] {kubernetes_executor.py:106} WARNING - There was a
timeout error accessing the Kube API. Retrying request.
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
line 313, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py",
line 1840, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/home/airflow/.local/lib/python3.8/site-packages/OpenSSL/SSL.py",
line 1646, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
436, in _error_catcher
yield
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
763, in read_chunked
self._update_chunk_length()
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
693, in _update_chunk_length
line = self._fp.fp.readline()
File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/contrib/pyopenssl.py",
line 326, in recv_into
raise timeout("The read operation timed out")
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
line 102, in run
self.resource_version = self._run(
File
"/home/airflow/.local/lib/python3.8/site-packages/airflow/executors/kubernetes_executor.py",
line 145, in _run
for event in list_worker_pods():
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py",
line 144, in stream
for line in iter_resp_lines(resp):
File
"/home/airflow/.local/lib/python3.8/site-packages/kubernetes/watch/watch.py",
line 46, in iter_resp_lines
for seg in resp.read_chunked(decode_content=False):
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
792, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.8/contextlib.py", line 131, in __exit__
self.gen.throw(type, value, traceback)
File
"/home/airflow/.local/lib/python3.8/site-packages/urllib3/response.py", line
441, in _error_catcher
raise ReadTimeoutError(self._pool, None, "Read timed out.")
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='10.0.0.1',
port=443): Read timed out.
```
It is not the prettiest solution but it works.... Definitely needs to be
addresses somehow.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]