[ https://issues.apache.org/jira/browse/AIRFLOW-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862889#comment-16862889 ]
Bharath Palaksha commented on AIRFLOW-4527: ------------------------------------------- [~ash], please find reproducing steps in the above comment. Issue is find_zombies() detects tasks which are in zombie state but when it is passed to dag processing process, list of zombies is empty. > Connection error while calling refreshfromdb() makes the task stuck in > running state > ------------------------------------------------------------------------------------ > > Key: AIRFLOW-4527 > URL: https://issues.apache.org/jira/browse/AIRFLOW-4527 > Project: Apache Airflow > Issue Type: Bug > Components: database > Reporter: Bharath Palaksha > Assignee: Bharath Palaksha > Priority: Major > Labels: mysql > Fix For: 1.10.2 > > > {{I have setup airflow with mysql as metastore. When there is a network issue > and task fails with a network connection reset exception, airflow tries to > refresh status from db and gets a connection error - This results in task > getting stuck in running.}} > {{There is no retry for mysql connection error and it never handles the > exception}} > If worker nodes are unable to reach mysql to update task status, scheduler > node should handle this scenario and mark those tasks failed. Tasks shouldn't > be stuck in running state for ever. > > Scheduler heartbeat got an exception: (MySQLdb._exceptions.OperationalError) > (2013, "Lost connection to MySQL server at 'reading authorization packet', > system error: 104") (Background on this error at: [http://sqlalche.me/e/e3q8]) > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data Traceback (most recent > call last): > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/bin/airflow", line 32, in <module> > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data args.func(args) > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in > wrapper > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data return f(*args, **kwargs) > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 526, in run > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data _run(args, dag, ti) > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 445, in _run > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data pool=args.pool, > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 73, in > wrapper > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data return func(*args, > **kwargs) > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1692, in > _run_raw_task > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data self.refresh_from_db() > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 73, in > wrapper > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data return func(*args, > **kwargs) > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data File > "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1218, in > refresh_from_db > {base_task_runner.py:101} > INFO - Job 989226: Subtask count_cust_shipped_data ti = qry.first() > -- This message was sent by Atlassian JIRA (v7.6.3#76005)