agrubb86 opened a new issue #10700:
URL: https://github.com/apache/airflow/issues/10700


   <!--
   
   Welcome to Apache Airflow!  For a smooth issue process, try to answer the 
following questions.
   Don't worry if they're not all applicable; just try to include what you can 
:-)
   
   If you need to include code snippets or logs, please put them in fenced code
   blocks.  If they're super-long, please use the details tag like
   <details><summary>super-long log</summary> lots of stuff </details>
   
   Please delete these comment blocks before submitting the issue.
   
   -->
   
   <!--
   
   IMPORTANT!!!
   
   PLEASE CHECK "SIMILAR TO X EXISTING ISSUES" OPTION IF VISIBLE
   NEXT TO "SUBMIT NEW ISSUE" BUTTON!!!
   
   PLEASE CHECK IF THIS ISSUE HAS BEEN REPORTED PREVIOUSLY USING SEARCH!!!
   
   Please complete the next sections or the issue will be closed.
   This questions are the first thing we need to know to understand the context.
   
   -->
   
   **Apache Airflow version**: 1.10.11
   
   **Environment**: Linux aarch64
   
   - **Cloud provider or hardware configuration**: AWS
   - **OS** (e.g. from /etc/os-release): 
   NAME="Ubuntu"
   VERSION="18.04.4 LTS (Bionic Beaver)"
   ID=ubuntu
   ID_LIKE=debian
   PRETTY_NAME="Ubuntu 18.04.4 LTS"
   VERSION_ID="18.04"
   
   - **Kernel** (e.g. `uname -a`):
   Linux 4.15.0-1063-aws #67-Ubuntu SMP Mon Mar 2 07:25:24 UTC 2020 aarch64 
aarch64 aarch64 GNU/Linux
   - **Install tools**: pip3
   - **Others**:
   Config options for SQLAlchemy:
   
   ```
   sql_alchemy_conn = mysql+mysqldb://XXXXX:XXXXX@XXXXX/airflow
   sql_engine_encoding = utf-8
   sql_alchemy_pool_enabled = True
   sql_alchemy_pool_size = 10
   sql_alchemy_max_overflow = 10
   sql_alchemy_pool_recycle = 270
   sql_alchemy_pool_pre_ping = True
   ```
   
   Config options for MySQL server:
   `wait_timeout = 300`
   
   **What happened**:
   
   From the BashOperator logs:
   
   ```
   [2020-09-02 03:38:04,363] {bash_operator.py:161} INFO - Command exited with 
return code 0
   [2020-09-02 03:38:04,371] {taskinstance.py:1150} ERROR - 
(_mysql_exceptions.OperationalError) (2006, 'MySQL server has gone away')
   [SQL: SELECT airflow.task_instance.try_number AS 
airflow_task_instance_try_number, airflow.task_instance.task_id AS 
airflow_task_instance_task_id, airflow.task_instance.dag_id AS 
airflow_task_instance_dag_id, airflow.task_instance.execution_date AS 
airflow_task_instance_execution_date, airflow.task_instance.start_date AS 
airflow_task_instance_start_date, airflow.task_instance.end_date AS 
airflow_task_instance_end_date, airflow.task_instance.duration AS 
airflow_task_instance_duration, airflow.task_instance.state AS 
airflow_task_instance_state, airflow.task_instance.max_tries AS 
airflow_task_instance_max_tries, airflow.task_instance.hostname AS 
airflow_task_instance_hostname, airflow.task_instance.unixname AS 
airflow_task_instance_unixname, airflow.task_instance.job_id AS 
airflow_task_instance_job_id, airflow.task_instance.pool AS 
airflow_task_instance_pool, airflow.task_instance.pool_slots AS 
airflow_task_instance_pool_slots, airflow.task_instance.queue AS 
airflow_task_instance_que
 ue, airflow.task_instance.priority_weight AS 
airflow_task_instance_priority_weight, airflow.task_instance.operator AS 
airflow_task_instance_operator, airflow.task_instance.queued_dttm AS 
airflow_task_instance_queued_dttm, airflow.task_instance.pid AS 
airflow_task_instance_pid, airflow.task_instance.executor_config AS 
airflow_task_instance_executor_config 
   FROM airflow.task_instance 
   WHERE airflow.task_instance.dag_id = %s AND airflow.task_instance.task_id = 
%s AND airflow.task_instance.execution_date = %s 
    LIMIT %s FOR UPDATE]
   [parameters: ('XXXXX', 'XXXXX', datetime.datetime(2020, 9, 1, 21, 10), 1)]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   Traceback (most recent call last):
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
line 1278, in _execute_context
       cursor, statement, parameters, context
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 593, in do_execute
       cursor.execute(statement, parameters)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line 
255, in execute
       self.errorhandler(self, exc, value)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line 
50, in defaulterrorhandler
       raise errorvalue
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line 
252, in execute
       res = self._query(query)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line 
378, in _query
       db.query(q)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line 
280, in query
       _mysql.connection.query(self, query)
   _mysql_exceptions.OperationalError: (2006, 'MySQL server has gone away')
   
   The above exception was the direct cause of the following exception:
   
   Traceback (most recent call last):
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/airflow/models/taskinstance.py",
 line 1003, in _run_raw_task
       self.refresh_from_db(lock_for_update=True)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/airflow/utils/db.py", line 
74, in wrapper
       return func(*args, **kwargs)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/airflow/models/taskinstance.py",
 line 473, in refresh_from_db
       ti = qry.with_for_update().first()
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py", 
line 3397, in first
       ret = list(self[0:1])
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py", 
line 3171, in __getitem__
       return list(res)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py", 
line 3503, in __iter__
       return self._execute_and_instances(context)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/orm/query.py", 
line 3528, in _execute_and_instances
       result = conn.execute(querycontext.statement, self._params)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
line 1014, in execute
       return meth(self, multiparams, params)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/sql/elements.py", 
line 298, in _execute_on_connection
       return connection._execute_clauseelement(self, multiparams, params)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
line 1133, in _execute_clauseelement
       distilled_params,
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
line 1318, in _execute_context
       e, statement, parameters, cursor, context
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
line 1512, in _handle_dbapi_exception
       sqlalchemy_exception, with_traceback=exc_info[2], from_=e
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/util/compat.py", 
line 178, in raise_
       raise exception
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/base.py", 
line 1278, in _execute_context
       cursor, statement, parameters, context
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/sqlalchemy/engine/default.py",
 line 593, in do_execute
       cursor.execute(statement, parameters)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line 
255, in execute
       self.errorhandler(self, exc, value)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line 
50, in defaulterrorhandler
       raise errorvalue
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line 
252, in execute
       res = self._query(query)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/cursors.py", line 
378, in _query
       db.query(q)
     File 
"/opt/airflow/pyfiles/lib/python3.6/site-packages/MySQLdb/connections.py", line 
280, in query
       _mysql.connection.query(self, query)
   sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (2006, 
'MySQL server has gone away')
   [SQL: SELECT airflow.task_instance.try_number AS 
airflow_task_instance_try_number, airflow.task_instance.task_id AS 
airflow_task_instance_task_id, airflow.task_instance.dag_id AS 
airflow_task_instance_dag_id, airflow.task_instance.execution_date AS 
airflow_task_instance_execution_date, airflow.task_instance.start_date AS 
airflow_task_instance_start_date, airflow.task_instance.end_date AS 
airflow_task_instance_end_date, airflow.task_instance.duration AS 
airflow_task_instance_duration, airflow.task_instance.state AS 
airflow_task_instance_state, airflow.task_instance.max_tries AS 
airflow_task_instance_max_tries, airflow.task_instance.hostname AS 
airflow_task_instance_hostname, airflow.task_instance.unixname AS 
airflow_task_instance_unixname, airflow.task_instance.job_id AS 
airflow_task_instance_job_id, airflow.task_instance.pool AS 
airflow_task_instance_pool, airflow.task_instance.pool_slots AS 
airflow_task_instance_pool_slots, airflow.task_instance.queue AS 
airflow_task_instance_que
 ue, airflow.task_instance.priority_weight AS 
airflow_task_instance_priority_weight, airflow.task_instance.operator AS 
airflow_task_instance_operator, airflow.task_instance.queued_dttm AS 
airflow_task_instance_queued_dttm, airflow.task_instance.pid AS 
airflow_task_instance_pid, airflow.task_instance.executor_config AS 
airflow_task_instance_executor_config 
   FROM airflow.task_instance 
   WHERE airflow.task_instance.dag_id = %s AND airflow.task_instance.task_id = 
%s AND airflow.task_instance.execution_date = %s 
    LIMIT %s FOR UPDATE]
   [parameters: ('XXXXX', 'XXXXX', datetime.datetime(2020, 9, 1, 21, 10), 1)]
   (Background on this error at: http://sqlalche.me/e/13/e3q8)
   [2020-09-02 03:38:04,387] {taskinstance.py:1194} INFO - Marking task as 
FAILED. dag_id=XXXXX, task_id=XXXXX, execution_date=20200901T211000, 
start_date=20200901T211612, end_date=20200902T033804
   ```
   
   **What you expected to happen**:
   
   Task marked as SUCCESS due to exit code 0 from the bash_command
   
   **What do you think went wrong?**:
   
   Airflow seems to hold a SQLAlchemy connection in a checked-out state for the 
entire duration of the task and then attempts to use this connection without 
any health checks
   
   
[http://www.ilian.io/sqlalchemy-and-lost-connection-mysql-server-during-query/](url)
 (more details in the comments)
   [https://github.com/apache/airflow/issues/7937](url) (my report is probably 
a duplicate of this but this bug hasn't been touched in over 4 years, and mine 
further applies beyond Kubernetes/PythonOperator)
   
   
   **How to reproduce it**:
   1. Set up a MySQL server with wait_timeout = 300 and point Airflow to this 
server to use as the backend database
   2. Use CeleryExecutor
   3. Run this:
   ```
   BashOperator(
       task_id='Connection_Test',
       dag=dag,
       bash_command='sleep 10m'
   )
   ```
   
   **Anything else we need to know**:
   
   I am able to reproduce this 100% of the time using CeleryExecutor and unable 
to reproduce this 100% of the time using LocalExecutor, so it may be related to 
how CeleryExecutor (and KubernetesExecutor based on the other report) are 
handling connections during task runs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to