[ 
https://issues.apache.org/jira/browse/AIRFLOW-4527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16862889#comment-16862889
 ] 

Bharath Palaksha commented on AIRFLOW-4527:
-------------------------------------------

[~ash], please find reproducing steps in the above comment. 
Issue is find_zombies() detects tasks which are in zombie state but when it is 
passed to dag processing process, list of zombies is empty.

> Connection error while calling refreshfromdb() makes the task stuck in 
> running state
> ------------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-4527
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4527
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: database
>            Reporter: Bharath Palaksha
>            Assignee: Bharath Palaksha
>            Priority: Major
>              Labels: mysql
>             Fix For: 1.10.2
>
>
> {{I have setup airflow with mysql as metastore. When there is a network issue 
> and task fails with a network connection reset exception, airflow tries to 
> refresh status from db and gets a connection error - This results in task 
> getting stuck in running.}}
> {{There is no retry for mysql connection error and it never handles the 
> exception}}
> If worker nodes are unable to reach mysql to update task status, scheduler 
> node should handle this scenario and mark those tasks failed. Tasks shouldn't 
> be stuck in running state for ever.
>  
>  Scheduler heartbeat got an exception: (MySQLdb._exceptions.OperationalError) 
> (2013, "Lost connection to MySQL server at 'reading authorization packet', 
> system error: 104") (Background on this error at: [http://sqlalche.me/e/e3q8])
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data Traceback (most recent 
> call last):
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/bin/airflow", line 32, in <module>
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data args.func(args)
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/cli.py", line 74, in 
> wrapper
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data return f(*args, **kwargs)
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 526, in run
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data _run(args, dag, ti)
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/lib/python2.7/site-packages/airflow/bin/cli.py", line 445, in _run
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data pool=args.pool,
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 73, in 
> wrapper
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data return func(*args, 
> **kwargs)
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1692, in 
> _run_raw_task
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data self.refresh_from_db()
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/lib/python2.7/site-packages/airflow/utils/db.py", line 73, in 
> wrapper
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data return func(*args, 
> **kwargs)
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data File 
> "/usr/local/lib/python2.7/site-packages/airflow/models.py", line 1218, in 
> refresh_from_db
> {base_task_runner.py:101}
> INFO - Job 989226: Subtask count_cust_shipped_data ti = qry.first()
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to