[
https://issues.apache.org/jira/browse/AIRFLOW-2827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
James Davidheiser updated AIRFLOW-2827:
---------------------------------------
Issue Type: Bug (was: Wish)
> Tasks that fail with spurious Celery issues are not retried
> -----------------------------------------------------------
>
> Key: AIRFLOW-2827
> URL: https://issues.apache.org/jira/browse/AIRFLOW-2827
> Project: Apache Airflow
> Issue Type: Bug
> Reporter: James Davidheiser
> Priority: Major
>
> We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with
> RabbitMQ using a setup derived pretty heavily from
> [https://github.com/mumoshu/kube-airflow.] Occasionally, we will hit some
> spurious Celery execution failures (possibly related to #2011 ), resulting in
> the Worker throwing errors that look like this:
>
> ```[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task
> airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
> raised unexpected: AirflowException('Celery command failed',)
> Traceback (most recent call last):
> File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line
> 382, in trace_task
> R = retval = fun(*args, **kwargs)
> File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line
> 641, in __protected_call__
> return self.run(*args, **kwargs)
> File
> "/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py",
> line 55, in execute_command
> raise AirflowException('Celery command failed')
> AirflowException: Celery command failed```
>
> When these tasks fail, they send a "task failed" email that has very little
> information about the state of the task failure. The logs for the task run
> are empty, because the task never actually did anything and the error message
> was generated by the worker. Also, the task does not retry, so if something
> goes wrong with Celery, the task simply fails outright instead of trying
> again.
>
> This may be the same issue reported in #1844, but I am not sure because there
> is not much detail there.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)