taylorfinnell commented on issue #18011:
URL: https://github.com/apache/airflow/issues/18011#issuecomment-922384877
Hi @ephraimbuddy - I work with @WattsInABox. We don't see `FATAL: sorry, too
many clients already.` but we do see:
```
Traceback (most recent call last):
File
"/opt/app-root/lib64/python3.8/site-packages/airflow/jobs/base_job.py", line
202, in heartbeat
session.merge(self)
File
"/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line
2166, in merge
return self._merge(
File
"/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/session.py", line
2244, in _merge
merged = self.query(mapper.class_).get(key[1])
File
"/opt/app-root/lib64/python3.8/site-packages/sqlalchemy/orm/query.py", line
1018, in get
return self._get_impl(ident, loading.load_on_pk_identity)
....
psycopg2.OperationalError: could not connect to server: Connection timed out
```
This causes the job to be SIGTERM'ed (most of the time, it seems). The tasks
will now retry since we have #16301, and will eventually succeed. Sometimes it
is SIGTERM'ed 5 times or more before success - which is not ideal for tasks
that take an hour plus. I suspect also at times this results in the downstream
tasks being set to upstream_failed when in fact the upstream is all successful
- but I can't prove it.
We tried to bump the `AIRFLOW__SCHEDULER__JOB_HEARTBEAT_SEC` to `60` to
maybe ease up on hitting the database with no luck. This error also happens
when only a couple DAGs are running so there is not much load on our nodes or
the database. We don't think it's a networking issue.
Our pool sqlalchemy pool size is 350, this might be high - but my
understanding is the pool does not create connections until they are needed,
and according to AWS monitoring the max connections we ever hit at peak time is
~300-370 which should be totally manageable on our `db.m6g.4xlarge` instance.
Do you have any additional advice on things to try?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]