potiuk commented on issue #24731:
URL: https://github.com/apache/airflow/issues/24731#issuecomment-1318516581

   Thanks for the diagnosis, but I think you applied one of "good" solutions 
and there is not much we can and will do in Airlfow for that.
   
   I think what you did is the right approach (one of) not a workaround. This 
is expected. Airflow has no support for active/active setup for Redis or 
postgres and expects to talk to one database server only. There is no way for 
airflow components to recover when there is an established connection and an IP 
address of the componnent it talks to change in the way that Airlfow does not 
even know that the other party has changed the address. this is really a 
deployment issue, I think airflow should not really take into account such 
changes. 
   
   Airflow is not a "critical/real-time" service that should react and 
reconfigure it's networking dynamically and we have no intention to turn it 
into such service. Developing such 'autohealing" service is far more costly and 
unless someone comes up with idea, and create Airflow Improvement Proposal and 
implement such auto-healing, this is not something that is going to happen. 
There are many consequences and complexities to implement such services and 
there is no need to do so for Airlfow because this is perfectly fine to restart 
and redeploy airflow components from time to time and this is OK - far easier 
and less costly for development and maintenance. 
   
   This task is put on the deployment - that's why for example in our helm 
chart we have liveness probes and healthy checks and auto-healing in K8S is 
done exactly the way you did - when service becomes unhealthy, you restart it. 
This is perfectly ok and perfectly viable solution - especially when things 
like virtual IP changes which happen infrequently.
   
   Even better solution for you will be to react on the event of IP changes and 
restart the services immediately. This the kind of things that usually should 
and can be done on the deployment level - Airlfow has no knowledge about such 
events and cannot react to it - but your deployment can. And should. This will 
help you to recover much faster. 
   
   Another option - if you want to avoid such restarts - will be to avoid 
changing the Virtual IP and use static IP addresses allocated to each 
component. Usually changing virtual IP addresses is not something that happens 
in enterprise setup - it is safe to assume that you can come up with the 
approach that IP addresses are static - even if you have some dynamically 
changing Public IP addresses, you can usually have static private ones and you 
can configure your deployment to use them.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to