potiuk commented on issue #18011:
URL: https://github.com/apache/airflow/issues/18011#issuecomment-923419536


   >  IMHO Airflow should not be falling over in the heartbeats b/c of a 
first-time missed connection. There should be some intelligent retry logic in 
the heartbeats...
   
   Actually I do not agree with that statement.
   
    Airflow should rely on the metadata database being available at all times 
and loosing connectivity in the middle of transaction should not be handled by 
Airflow. That adds terrible complexity to your code and IMHO is not needed to 
deal with this kind of (apparent) instabilities of connectivity. Especially 
that it is a timeout on trying to connect to the database. In case of 
SQLAlchemy and ORM database level we often do not have control on when your 
session and connection is going to be established and trying to handle all such 
failures on application level is complex
   
   AND also it is not needed on application level - especially in case of 
Postgres. For quite some time (and also in our Helm Chart - for a long time we 
recommend everyone using Postgres to use PGBouncer as a proxy to your Postgres 
database. It deals nicely also with a number of connections open (Postgres is 
not good in handling many parallel connections - it's connection model is 
process based and thus it is resource hungry when there are many connections 
opened) 
   
   PGBouncer does not only handle managing of connections pools shared between 
components, but also allows to react on similar network connection conditions - 
first of all, it will reuse existing connections, so there will be far less 
connection open/close events between PGBouncer and the Database. All the 
connections opened by airflow will go to locally available PGBouncer which will 
make them toally resilient to networking issue. Then PGBouncer will handle 
errors which you can fine-tune if you have connectivity problems to your 
database.
   
   @WattsInABox  - can you please add PGBouncer (s) to your deployment and let 
us know if that improved the situation. I think this is not even a workaround - 
it's actually a good solution (which we generally recommend for any deployment 
with Postgres).
   
   I will convert it into discussion until we hear back from you - with your 
experiences with PGBouncer and if those problems are still occuring after you 
get PGBouncer running, with some reproducible case.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to