Alejandro Fernandez created AIRFLOW-2442:
--------------------------------------------
Summary: Airflow run command leaves database connections open,
which can hit the database limit
Key: AIRFLOW-2442
URL: https://issues.apache.org/jira/browse/AIRFLOW-2442
Project: Apache Airflow
Issue Type: Bug
Components: cli
Affects Versions: Airflow 1.8, 1.8.0
Reporter: Alejandro Fernandez
Assignee: Alejandro Fernandez
Fix For: Airflow 2.0
*Summary*
The "airflow run" command creates a connection to the database and leaves it
open (until killed by SQLALchemy later). The number of these connections can
skyrocket whenever hundreds/thousands of tasks are launched simultaneously, and
potentially hit the database connection limit.
The problem is that in cli.py, the run() method first calls
{code}settings.configure_orm(disable_connection_pool=True)\{code} correctly
to use a NullPool, but then parses any custom configs and again calls
\{code}settings.configure_orm()\{code}, thereby overriding the desired behavior
with a QueuePool.
The QueuePool uses the default configs for SQL_ALCHEMY_POOL_SIZE (5
connections) and SQL_ALCHEMY_POOL_RECYCLE (1 hour). This means that while the
task is running and the executor is sending heartbeats, the sleeping connection
is idle until it is killed by SQLAlchemy.
This fixes a bug introduced by
[https://github.com/apache/incubator-airflow/pull/1934] in
[https://github.com/apache/incubator-airflow/pull/1934/commits/b380013634b02bb4c1b9d1cc587ccd12383820b6#diff-1c2404a3a60f829127232842250ff406R344]
which is present in branches 1-8-stable, 1-9-stable, and 1-10-test
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)