Hi all,

We run DAGs, and sometimes Airflow crashes (for whatever reason--maybe
something as simple as the underlying infrastructure going down).

Currently, we run everything on Kubernetes (including Airflow), so the
Airflow pods crashes generally will be detected, and then they will restart.

However, if we have, e.g., a DAG that is running task X when it crashes,
when Airflow comes back up, it apparently sees task X didn't complete, so
it restarts the task (which, in this case, means it spins up an entirely
new instance/pod).  Thus, both run "X_1" and "X_2" are fired off
simultaneously.

Is there any (out of the box) way to better connect up state between tasks
and Airflow to prevent this?

(For additional context, we currently execute Kubernetes jobs via a custom
operator that basically layers on top of BashOperator...perhaps the new
Kubernetes operator will help address this?)

Thank you in advance for any thoughts,

Chris

Reply via email to