James Davidheiser created AIRFLOW-2827:
------------------------------------------

             Summary: Tasks that fail with spurious Celery issues are not 
retried
                 Key: AIRFLOW-2827
                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2827
             Project: Apache Airflow
          Issue Type: Wish
            Reporter: James Davidheiser


We have a DAG with ~500 tasks, running on Airflow set up in Kubernetes with 
RabbitMQ using a setup derived pretty heavily from 
[https://github.com/mumoshu/kube-airflow.]  Occasionally, we will hit some 
spurious Celery execution failures (possibly related to #2011 ), resulting in 
the Worker throwing errors that look like this:

 

```[2018-07-30 11:04:26,812: ERROR/ForkPoolWorker-9] Task 
airflow.executors.celery_executor.execute_command[462de800-ad3f-4151-90bf-9155cc6c66f6]
 raised unexpected: AirflowException('Celery command failed',)
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 382, 
in trace_task
    R = retval = fun(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/celery/app/trace.py", line 641, 
in __protected_call__
    return self.run(*args, **kwargs)
  File 
"/usr/local/lib/python2.7/dist-packages/airflow/executors/celery_executor.py", 
line 55, in execute_command
    raise AirflowException('Celery command failed')
AirflowException: Celery command failed```

 

When these tasks fail, they send a "task failed" email that has very little 
information about the state of the task failure.  The logs for the task run are 
empty, because the task never actually did anything and the error message was 
generated by the worker.  Also, the task does not retry, so if something goes 
wrong with Celery, the task simply fails outright instead of trying again.

 

This may be the same issue reported in #1844, but I am not sure because there 
is not much detail there.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to