James Davidheiser created AIRFLOW-2827:
------------------------------------------
Summary: Tasks that fail with spurious Celery issues are not
retried
Key: AIRFLOW-2827
URL: https://issues.apache.org/jira/browse/AIRFLOW-2827
Project: Apache Airflow
Issue Type: Wish
Reporter: James Davidheiser
We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with
RabbitMQ using a setup derived pretty heavily from
[https://github.com/mumoshu/kube-airflow.] Occasionally, we will hit some
spurious Celery execution failures (possibly related to #2011 ), resulting in
the Worker throwing errors that look like this:
```[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task
airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 382,
in trace_task
R = retval = fun(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 641,
in __protected_call__
return self.run(*args, **kwargs)
File
"/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py",
line 55, in execute_command
raise AirflowException('Celery command failed')
AirflowException: Celery command failed```
When these tasks fail, they send a "task failed" email that has very little
information about the state of the task failure. The logs for the task run are
empty, because the task never actually did anything and the error message was
generated by the worker. Also, the task does not retry, so if something goes
wrong with Celery, the task simply fails outright instead of trying again.
This may be the same issue reported in #1844, but I am not sure because there
is not much detail there.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)