arkadiuszbach removed a comment on issue #10860:
URL: https://github.com/apache/airflow/issues/10860#issuecomment-791796674


   On Azure LoadBalancers have tcp idle timeout set as default to 4 minutes(it 
is visible in JSON View), so if you connect and don't interact with it in more 
than 4 minutes it will drop the connection.
   
   Airflow is using Kuberenetes watcher in order to monitor pod events and it 
is using it in stream mode.
   So It connects and waits for events, if there are no events in more than 4 
minutes the LoadBalancer drops connection, but watcher is still listening for 
events and don't even know that connection was dropped
   
   When you add:
   ```
   -name: AIRFLOW__KUBERNETES__KUBE_CLIENT_REQUEST_ARGS
     value: '{"_request_timeout" : [60, 60]}'
   ```
   Then if the watcher did not get any events in more than 60 seconds the Read 
Timeout happens, but this time disconnect is one the client side(Airflow) - 
there is a `while(true)` in the code, so it will connect again and that is what 
you can see in the logs when it says "and now my watch begins"
   
   I tried the solution with _request_timeout, it works, but i didn't like 
these errors in the logs, so i looked and found following, which is pretty the 
same(it involves LoadBalancers) and describes it in more detail: 
https://www.finbourne.com/blog/the-mysterious-hanging-client-tcp-keep-alives
   
   So the solution is to add TCP keep alive, it will probe LoadBalancer and the 
idle timeout will not be triggered, even if for some reason LoadBalancer will 
disconnect,  keep alive will probe it and if it does not respond(for example 3 
times over 60seconds) it will simply disconnect and connect again
   
   More information about keepalive:
   
https://stackoverflow.com/questions/1480236/does-a-tcp-socket-connection-have-a-keep-alive
   
   With the help from solution above: 
https://github.com/maganaluis/k8s-api-python i was able to make it work.
   So i just downloaded the airflow version i had from pypi(Airflow 1.10.14) 
took the airflow file from `aiflow/bin` and after 
   `if __name__ == '__main__':` added:
   ```
       import socket
       from urllib3 import connection
       connection.HTTPConnection.default_socket_options += [
           (socket.SOL_SOCKET, socket.SO_KEEPALIVE, 1),
           (socket.IPPROTO_TCP, socket.TCP_KEEPIDLE, 60),
           (socket.IPPROTO_TCP, socket.TCP_KEEPINTVL, 60),
           (socket.IPPROTO_TCP, socket.TCP_KEEPCNT, 3)
       ]
   ``` 
   renamed this file to airflow_custom_start.py and added it in the 
AIRFLOW_HOME directory inside my Airflow Docker image, then in the 
entrypoint.sh i just started scheduler not by using:
    `airflow scheduler` command, 
   but:
    `python $AIRFLOW_HOME/airflow_custom_start.py scheduler`
   
   Also remember to remove request_timemout otherwise it will keep disconnecting


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to