agrubb86 opened a new issue #10700:
URL: https://github.com/apache/airflow/issues/10700
<!--
Welcome to Apache Airflow! For a smooth issue process, try to answer the
following questions.
Don't worry if they're not all applicable; just try to include what you can
:-)
If you need to include code snippets or logs, please put them in fenced code
blocks. If they're super-long, please use the details tag like
<details><summary>super-long log</summary> lots of stuff </details>
Please delete these comment blocks before submitting the issue.
-->
<!--
IMPORTANT!!!
PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
Please complete the next sections or the issue will be closed.
This questions are the first thing we need to know to understand the context.
-->
**Apache Airflow version**: 1.10.11
**Environment**: Linux aarch64
- **Cloud provider or hardware configuration**: AWS
- **OS** (e.g. from /etc/os-release):
NAME="Ubuntu"
VERSION="18.04.4 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.4 LTS"
VERSION_ID="18.04"
- **Kernel** (e.g. `uname -a`):
Linux 4.15.0-1063-aws #67-Ubuntu SMP Mon Mar 2 07:25:24 UTC 2020 aarch64
aarch64 aarch64 GNU/Linux
- **Install tools**: pip3
- **Others**:
Config options for SQLAlchemy:
```
sql_alchemy_conn = mysql+mysqldb://XXXXX:XXXXX@XXXXX/airflow
sql_engine_encoding = utf-8
sql_alchemy_pool_enabled = True
sql_alchemy_pool_size = 10
sql_alchemy_max_overflow = 10
sql_alchemy_pool_recycle = 270
sql_alchemy_pool_pre_ping = True
```
Config options for MySQL server:
`wait_timeout = 300`
**What happened**:
From the BashOperator logs:
```
[2020-09-02 03:38:04,363] {bash_operator.py:161} INFO - Command exited with
return code 0
[2020-09-02 03:38:04,371] {taskinstance.py:1150} ERROR -
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
[SQL: SELECT airflow.task_instance.try_number AS
airflow_task_instance_try_number, airflow.task_instance.task_id AS
airflow_task_instance_task_id, airflow.task_instance.dag_id AS
airflow_task_instance_dag_id, airflow.task_instance.execution_date AS
airflow_task_instance_execution_date, airflow.task_instance.start_date AS
airflow_task_instance_start_date, airflow.task_instance.end_date AS
airflow_task_instance_end_date, airflow.task_instance.duration AS
airflow_task_instance_duration, airflow.task_instance.state AS
airflow_task_instance_state, airflow.task_instance.max_tries AS
airflow_task_instance_max_tries, airflow.task_instance.hostname AS
airflow_task_instance_hostname, airflow.task_instance.unixname AS
airflow_task_instance_unixname, airflow.task_instance.job_id AS
airflow_task_instance_job_id, airflow.task_instance.pool AS
airflow_task_instance_pool, airflow.task_instance.pool_slots AS
airflow_task_instance_pool_slots, airflow.task_instance.queue AS
airflow_task_instance_que
ue, airflow.task_instance.priority_weight AS
airflow_task_instance_priority_weight, airflow.task_instance.operator AS
airflow_task_instance_operator, airflow.task_instance.queued_dttm AS
airflow_task_instance_queued_dttm, airflow.task_instance.pid AS
airflow_task_instance_pid, airflow.task_instance.executor_config AS
airflow_task_instance_executor_config
FROM airflow.task_instance
WHERE airflow.task_instance.dag_id = %s AND airflow.task_instance.task_id =
%s AND airflow.task_instance.execution_date = %s
LIMIT %s FOR UPDATE]
[parameters: ('XXXXX', 'XXXXX', datetime.datetime(2020, 9, 1, 21, 10), 1)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
Traceback (most recent call last):
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1278, in _execute_context
cursor, statement, parameters, context
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
line 593, in do_execute
cursor.execute(statement, parameters)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line
255, in execute
self.errorhandler(self, exc, value)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line
50, in defaulterrorhandler
raise errorvalue
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line
252, in execute
res = self._query(query)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line
378, in _query
db.query(q)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line
280, in query
_mysql.connection.query(self, query)
_mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/airflow/models/taskinstance.py",
line 1003, in _run_raw_task
self.refresh_from_db(lock_for_update=True)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/airflow/utils/db.py", line
74, in wrapper
return func(*args, **kwargs)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/airflow/models/taskinstance.py",
line 473, in refresh_from_db
ti = qry.with_for_update().first()
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py",
line 3397, in first
ret = list(self[0:1])
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py",
line 3171, in __getitem__
return list(res)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py",
line 3503, in __iter__
return self._execute_and_instances(context)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py",
line 3528, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1014, in execute
return meth(self, multiparams, params)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/sql/elements.py",
line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1133, in _execute_clauseelement
distilled_params,
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1318, in _execute_context
e, statement, parameters, cursor, context
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1512, in _handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from_=e
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/util/compat.py",
line 178, in raise_
raise exception
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py",
line 1278, in _execute_context
cursor, statement, parameters, context
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
line 593, in do_execute
cursor.execute(statement, parameters)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line
255, in execute
self.errorhandler(self, exc, value)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line
50, in defaulterrorhandler
raise errorvalue
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line
252, in execute
res = self._query(query)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line
378, in _query
db.query(q)
File
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line
280, in query
_mysql.connection.query(self, query)
sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2006,
'MySQL server has gone away')
[SQL: SELECT airflow.task_instance.try_number AS
airflow_task_instance_try_number, airflow.task_instance.task_id AS
airflow_task_instance_task_id, airflow.task_instance.dag_id AS
airflow_task_instance_dag_id, airflow.task_instance.execution_date AS
airflow_task_instance_execution_date, airflow.task_instance.start_date AS
airflow_task_instance_start_date, airflow.task_instance.end_date AS
airflow_task_instance_end_date, airflow.task_instance.duration AS
airflow_task_instance_duration, airflow.task_instance.state AS
airflow_task_instance_state, airflow.task_instance.max_tries AS
airflow_task_instance_max_tries, airflow.task_instance.hostname AS
airflow_task_instance_hostname, airflow.task_instance.unixname AS
airflow_task_instance_unixname, airflow.task_instance.job_id AS
airflow_task_instance_job_id, airflow.task_instance.pool AS
airflow_task_instance_pool, airflow.task_instance.pool_slots AS
airflow_task_instance_pool_slots, airflow.task_instance.queue AS
airflow_task_instance_que
ue, airflow.task_instance.priority_weight AS
airflow_task_instance_priority_weight, airflow.task_instance.operator AS
airflow_task_instance_operator, airflow.task_instance.queued_dttm AS
airflow_task_instance_queued_dttm, airflow.task_instance.pid AS
airflow_task_instance_pid, airflow.task_instance.executor_config AS
airflow_task_instance_executor_config
FROM airflow.task_instance
WHERE airflow.task_instance.dag_id = %s AND airflow.task_instance.task_id =
%s AND airflow.task_instance.execution_date = %s
LIMIT %s FOR UPDATE]
[parameters: ('XXXXX', 'XXXXX', datetime.datetime(2020, 9, 1, 21, 10), 1)]
(Background on this error at: http://sqlalche.me/e/13/e3q8)
[2020-09-02 03:38:04,387] {taskinstance.py:1194} INFO - Marking task as
FAILED. dag_id=XXXXX, task_id=XXXXX, execution_date=20200901T211000,
start_date=20200901T211612, end_date=20200902T033804
```
**What you expected to happen**:
Task marked as SUCCESS due to exit code 0 from the bash_command
**What do you think went wrong?**:
Airflow seems to hold a SQLAlchemy connection in a checked-out state for the
entire duration of the task and then attempts to use this connection without
any health checks
[http://www.ilian.io/sqlalchemy-and-lost-connection-mysql-server-during-query/](url)
(more details in the comments)
[https://github.com/apache/airflow/issues/7937](url) (my report is probably
a duplicate of this but this bug hasn't been touched in over 4 years, and mine
further applies beyond Kubernetes/PythonOperator)
**How to reproduce it**:
1. Set up a MySQL server with wait_timeout = 300 and point Airflow to this
server to use as the backend database
2. Use CeleryExecutor
3. Run this:
```
BashOperator(
task_id='Connection_Test',
dag=dag,
bash_command='sleep 10m'
)
```
**Anything else we need to know**:
I am able to reproduce this 100% of the time using CeleryExecutor and unable
to reproduce this 100% of the time using LocalExecutor, so it may be related to
how CeleryExecutor (and KubernetesExecutor based on the other report) are
handling connections during task runs.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]