Hi Everyone,

Since last 2 weeks, we're facing an issue with LocalExecutor setup of Airflow 
v1.9(MySQL as metastore) where in a DAG if retry has been configured and 
initial try_number gets failed, then nearly 8 out of 10 times, task will get 
stuck in up_for_retry state, in fact there is no running state coming after 
Scheduled>Queued in TI. In Job table entry gets successful within fraction of 
second and failed entry gets logged in task_fail table without task even 
reaching to operator code and as a result we get aemail alert saying 

```
Try 2 out of 4
Exception:
Executor reports task instance %s finished (%s) although the task says its %s. 
Was the task killed externally?
```

But when default value of job_heartbeat_sec changed from 5 to 30 
seconds(https://groups.google.com/forum/#!topic/airbnb_airflow/hTXKFw2XFx0 
mentioned by Max sometimes back in 2016 for healthy supervision), this issue 
stops arising. But we're still clueless how this new configuration actually 
solved/suppressed the issue, any key information around it would really help 
here.

Regards,
Vardan Gupta 

Reply via email to