Upon further internal discussion, we might be seeing the task cloning because the postgres DB is getting into a corrupted state...but unclear. If consensus is we *shouldn't* be seeing this behavior, even as-is, we'll push more on that angle.
On Sun, Dec 17, 2017 at 10:45 AM, Christopher Bockman <ch...@fathomhealth.co > wrote: > Hi all, > > We run DAGs, and sometimes Airflow crashes (for whatever reason--maybe > something as simple as the underlying infrastructure going down). > > Currently, we run everything on Kubernetes (including Airflow), so the > Airflow pods crashes generally will be detected, and then they will restart. > > However, if we have, e.g., a DAG that is running task X when it crashes, > when Airflow comes back up, it apparently sees task X didn't complete, so > it restarts the task (which, in this case, means it spins up an entirely > new instance/pod). Thus, both run "X_1" and "X_2" are fired off > simultaneously. > > Is there any (out of the box) way to better connect up state between tasks > and Airflow to prevent this? > > (For additional context, we currently execute Kubernetes jobs via a custom > operator that basically layers on top of BashOperator...perhaps the new > Kubernetes operator will help address this?) > > Thank you in advance for any thoughts, > > Chris >