Amit Ghosh created AIRFLOW-3080:
-----------------------------------
Summary: Mysql OperationalError occurs during heartbeat or any DB
operation
Key: AIRFLOW-3080
URL: https://issues.apache.org/jira/browse/AIRFLOW-3080
Project: Apache Airflow
Issue Type: Bug
Components: scheduler, worker
Affects Versions: 1.10.0
Reporter: Amit Ghosh
Assignee: Amit Ghosh
When airflow uses mysql and airflow has many worker instances and no dag was
executed for a long time mysql gives "mysql_exceptions.OperationalError".
Main issue is if connections become stale for a long time, first db request
gives this error because mysql marks connection as stale after some time if no
connection has happened to db from a given sqlachemy pool. I am working on a
fix and will commit it and that should work in case of other databases also.
1) Log Text = \{"log":"[2018-09-18 05:33:45,296] {jobs.py:748} ERROR -
(_mysql_exceptions.OperationalError) (2005, \"Unknown MySQL server host
'mlp.prod.machine-learning-platform-prod.ms-df-cloudrdbms.prod.walmart.com'
(2)\") (Background on this error at: [http://sqlalche.me/e/e3q8])
","stream":"stdout","time":"2018-09-18T05:33:45.315547946Z"}
2) Log Text = {"log":" raise errorvalue
","stream":"stderr","time":"2018-09-15T06:04:35.722310847Z"}
{"log":"sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError)
(2013, 'Lost connection to MySQL server during query') [SQL: u'UPDATE job SET
latest_heartbeat=%s WHERE job.id = %s'] [parameters: (datetime.datetime(2018,
9, 15, 6, 4, 23, 4294), 345143L)] (Background on this error at:
[http://sqlalche.me/e/e3q8])
","stream":"stderr","time":"2018-09-15T06:04:35.72232954Z"}
{"log":"[2018-09-15 06:04:35,844: ERROR/ForkPoolWorker-13] Command 'airflow run
dag_2063_baf60054-d0c7-41b2-8009-4d88f773dc79 web_crawl_pipeline
2018-09-14T05:48:42 --local -sd
DAGS_FOLDER/1833_workflows/dag_2063_baf60054-d0c7-41b2-8009-4d88f773dc79.py '
returned non-zero exit status 1
","stream":"stderr","time":"2018-09-15T06:04:35.847747612Z"}
{"log":"[2018-09-15 06:04:35,851: ERROR/ForkPoolWorker-13] Task
airflow.executors.celery_executor.execute_command[30141a5a-71da-4d28-a829-495aeca3cfa9]
raised unexpected: AirflowException('Celery command failed',)
","stream":"stderr","time":"2018-09-15T06:04:35.855019453Z"}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)