My team and I have been experimenting with Airflow and Kubernetes, and
there has been a lot of activity recently with the Kubernetes Executor so
hopefully someone can help us out.
Specifically, we are using our own variant of the kubernetes executor to
run some pods on pre-emptible VMs on GKE (
and were wondering if anyone had an advice regarding how to handle
pre-emptions of nodes in a graceful way.
Currently, if a node gets pre-empted and is removed our pod dies causing a
corresponding airflow task to fail, but in such cases we'd really like the
pod to be recreated and the task go continue on. At the same time we want
other 'normal' failures to cause the airflow task to fail.
One idea is to use jobs instead of pods, but if I recall correctly there
was already a bunch of discussion on this topic for the apache Kube
Executor, and in the end pods were chosen.
Does anyone have any ideas about how to work with pre-emptible
VMs+GKE+Airflow? Any help is appreciated!