trevorprater opened a new issue #9837:
URL: https://github.com/apache/airflow/issues/9837


   
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   This questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: 1.10.10
   
   **Environment**: Centos Linux 7
   
   - **Cloud provider or hardware configuration**:
   - **OS** (e.g. from /etc/os-release): Centos Linux 7
   - **Kernel** (e.g. `uname -a`): cannot disclose
   - **Install tools**: n/a
   - **Others**: n/a
   
   **What happened**:
   I am re-posting [#9735](https://github.com/apache/airflow/issues/9735) 
(original did not use the issue template). I have recently seen the same 
problem, resulting in an 800MB log file for a single task run.
   
   ```
   "ERROR - LocalTaskJob heartbeat got an exception" spammed about > 30,000 
times, yielding a massive log file.
   According to #5589 and #6284 this issue has been fixed. Both fixes were 
included 1.10.6, though the problem still exists.
   
   (Background on this error at: http://sqlalche.me/e/e3q8)
   ```
   
   
   **What you expected to happen**:
   
   I would expect that the DAG would fail in a timely manner due to a lack of 
worker heartbeats.
   
   **How to reproduce it**:
   
   This appears to occur randomly, presumably while the database is performing 
poorly. I suspect this could be reproduced by overloading the DB while a DAG is 
running.
   
   
   How often does this problem occur?
   
   This problem occurs when the database becomes unreachable (rarely)
   The logs pasted below are from the linked issue above, not my own. In my 
logs, the underlying database became unavailable for some time. In the logs 
below, it appears the DB has too many open connections. I am using MySQL where 
the referenced logs are using Postgres, so maybe it is still the same root 
cause.
   
   <details>```
   [2020-06-02 03:47:15,676] {logging_mixin.py:112} INFO - [2020-06-02 
03:47:15,658] {base_job.py:205} ERROR - LocalTaskJob heartbeat got an exception
   
   Traceback (most recent call last):
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 2285, in _wrap_pool_connect
   return fn()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 363, in connect
   return _ConnectionFairy._checkout(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 773, in _checkout
   fairy = _ConnectionRecord.checkout(pool)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 492, in checkout
   rec = pool._do_get()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py",
 line 238, in _do_get
   return self._create_connection()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 308, in _create_connection
   return _ConnectionRecord(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 437, in init
   self.__connect(first_connect_check=True)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 657, in _connect
   pool.logger.debug("Error on connect(): %s", e)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py",
 line 69, in exit
   exc_value, with_traceback=exc_tb,
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py",
 line 178, in raise
   raise exception
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 652, in __connect
   connection = pool._invoke_creator(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py",
 line 114, in connect
   return dialect.connect(*cargs, **cparams)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 488, in connect
   return self.dbapi.connect(*cargs, **cparams)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/psycopg2/init.py", 
line 127, in connect
   conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
   psycopg2.OperationalError: ERROR: no more connections allowed 
(max_client_conn)
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
   File 
"/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/airflow/jobs/base_job.py",
 line 172, in heartbeat
   session.merge(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py",
 line 2128, in merge
   _resolve_conflict_map=_resolve_conflict_map,
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py",
 line 2201, in merge
   merged = self.query(mapper.class).get(key[1])
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 1004, in get
   return self._get_impl(ident, loading.load_on_pk_identity)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 1119, in _get_impl
   return db_load_fn(self, primary_key_identity)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/loading.py",
 line 284, in load_on_pk_identity
   return q.one()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 3358, in one
   ret = self.one_or_none()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 3327, in one_or_none
   ret = list(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 3403, in iter
   return self._execute_and_instances(context)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 3425, in _execute_and_instances
   querycontext, self._connection_from_session, close_with_result=True
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 3440, in _get_bind_args
   mapper=self._bind_mapper(), clause=querycontext.statement, **kw
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/query.py",
 line 3418, in _connection_from_session
   conn = self.session.connection(**kw)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py",
 line 1133, in connection
   execution_options=execution_options,
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py",
 line 1139, in _connection_for_bind
   engine, execution_options
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/orm/session.py",
 line 432, in _connection_for_bind
   conn = bind._contextual_connect()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 2251, in _contextual_connect
   self._wrap_pool_connect(self.pool.connect, None),
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 2289, in wrap_pool_connect
   e, dialect, self
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 1555, in handle_dbapi_exception_noconnection
   sqlalchemy_exception, with_traceback=exc_info[2], from=e
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py",
 line 178, in raise
   raise exception
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/base.py",
 line 2285, in _wrap_pool_connect
   return fn()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 363, in connect
   return _ConnectionFairy._checkout(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 773, in _checkout
   fairy = _ConnectionRecord.checkout(pool)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 492, in checkout
   rec = pool._do_get()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/impl.py",
 line 238, in _do_get
   return self._create_connection()
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 308, in _create_connection
   return _ConnectionRecord(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 437, in init
   self.__connect(first_connect_check=True)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 657, in _connect
   pool.logger.debug("Error on connect(): %s", e)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/util/langhelpers.py",
 line 69, in exit
   exc_value, with_traceback=exc_tb,
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/util/compat.py",
 line 178, in raise
   raise exception
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/pool/base.py",
 line 652, in __connect
   connection = pool._invoke_creator(self)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/strategies.py",
 line 114, in connect
   return dialect.connect(*cargs, **cparams)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 488, in connect
   return self.dbapi.connect(*cargs, **cparams)
   File 
"/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/psycopg2/init.py", 
line 127, in connect
   conn = _connect(dsn, connection_factory=connection_factory, **kwasync)
   sqlalchemy.exc.OperationalError: (psycopg2.OperationalError) ERROR: no more 
connections allowed (max_client_conn)
   
   </details>
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to