Hi all, My team and I have been experimenting with Airflow and Kubernetes, and there has been a lot of activity recently with the Kubernetes Executor so hopefully someone can help us out.
Specifically, we are using our own variant of the kubernetes executor to run some pods on pre-emptible VMs on GKE ( https://cloud.google.com/kubernetes-engine/docs/concepts/preemptible-vm), and were wondering if anyone had an advice regarding how to handle pre-emptions of nodes in a graceful way. Currently, if a node gets pre-empted and is removed our pod dies causing a corresponding airflow task to fail, but in such cases we'd really like the pod to be recreated and the task go continue on. At the same time we want other 'normal' failures to cause the airflow task to fail. One idea is to use jobs instead of pods, but if I recall correctly there was already a bunch of discussion on this topic for the apache Kube Executor, and in the end pods were chosen. Does anyone have any ideas about how to work with pre-emptible VMs+GKE+Airflow? Any help is appreciated! Thanks, Kevin
