how to have good DAG+Kubernetes behavior on airflow crash/recovery?

Christopher Bockman Sun, 17 Dec 2017 10:46:27 -0800

Hi all,

We run DAGs, and sometimes Airflow crashes (for whatever reason--maybe
something as simple as the underlying infrastructure going down).


Currently, we run everything on Kubernetes (including Airflow), so the
Airflow pods crashes generally will be detected, and then they will restart.

However, if we have, e.g., a DAG that is running task X when it crashes,
when Airflow comes back up, it apparently sees task X didn't complete, so
it restarts the task (which, in this case, means it spins up an entirely
new instance/pod).  Thus, both run "X_1" and "X_2" are fired off
simultaneously.

Is there any (out of the box) way to better connect up state between tasks
and Airflow to prevent this?

(For additional context, we currently execute Kubernetes jobs via a custom
operator that basically layers on top of BashOperator...perhaps the new
Kubernetes operator will help address this?)

Thank you in advance for any thoughts,

Chris

how to have good DAG+Kubernetes behavior on airflow crash/recovery?

Reply via email to