arkadiuszbach removed a comment on issue #10860:
URL: https://github.com/apache/airflow/issues/10860#issuecomment-791796674
On Azure LoadBalancers have tcp idle timeout set as default to 4 minutes(it
is visible in JSON View), so if you connect and don't interact with it in more
than 4 minutes it will drop the connection.
Airflow is using Kuberenetes watcher in order to monitor pod events and it
is using it in stream mode.
So It connects and waits for events, if there are no events in more than 4
minutes the LoadBalancer drops connection, but watcher is still listening for
events and don't even know that connection was dropped
When you add:
```
-name: AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS
value: '{"_request_timeout" : [60, 60]}'
```
Then if the watcher did not get any events in more than 60 seconds the Read
Timeout happens, but this time disconnect is one the client side(Airflow) -
there is a `while(true)` in the code, so it will connect again and that is what
you can see in the logs when it says "and now my watch begins"
I tried the solution with _request_timeout, it works, but i didn't like
these errors in the logs, so i looked and found following, which is pretty the
same(it involves LoadBalancers) and describes it in more detail:
https://www.finbourne.com/blog/the-mysterious-hanging-client-tcp-keep-alives
So the solution is to add TCP keep alive, it will probe LoadBalancer and the
idle timeout will not be triggered, even if for some reason LoadBalancer will
disconnect, keep alive will probe it and if it does not respond(for example 3
times over 60seconds) it will simply disconnect and connect again
More information about keepalive:
https://stackoverflow.com/questions/1480236/does-a-tcp-socket-connection-have-a-keep-alive
With the help from solution above:
https://github.com/maganaluis/k8s-api-python i was able to make it work.
So i just downloaded the airflow version i had from pypi(Airflow 1.10.14)
took the airflow file from `aiflow/bin` and after
`if __name__ == '__main__':` added:
```
import socket
from urllib3 import connection
connection.HTTPConnection.default_socket_options += [
(socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
(socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60),
(socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60),
(socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3)
]
```
renamed this file to airflow_custom_start.py and added it in the
AIRFLOW_HOME directory inside my Airflow Docker image, then in the
entrypoint.sh i just started scheduler not by using:
`airflow scheduler` command,
but:
`python $AIRFLOW_HOME/airflow_custom_start.py scheduler`
Also remember to remove request_timemout otherwise it will keep disconnecting
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]