Hi all, We run DAGs, and sometimes Airflow crashes (for whatever reason--maybe something as simple as the underlying infrastructure going down).
Currently, we run everything on Kubernetes (including Airflow), so the Airflow pods crashes generally will be detected, and then they will restart. However, if we have, e.g., a DAG that is running task X when it crashes, when Airflow comes back up, it apparently sees task X didn't complete, so it restarts the task (which, in this case, means it spins up an entirely new instance/pod). Thus, both run "X_1" and "X_2" are fired off simultaneously. Is there any (out of the box) way to better connect up state between tasks and Airflow to prevent this? (For additional context, we currently execute Kubernetes jobs via a custom operator that basically layers on top of BashOperator...perhaps the new Kubernetes operator will help address this?) Thank you in advance for any thoughts, Chris