Hi Kevin,

I can answer in more detail in about 1 hour but I'll give a few quick
points:

1. We decided that jobs are pretty risky when it comes to airflow
deployments (imagine a pod that launches a spark job retrying infinitely),
however airflow allows you to define a retry policy


2. We specifically attempted to prevent dagfailures due to pods dying, but
I think we didn't account for pods dying mid tasks (or just assumed people
would just restart). I think this can be a PR against the executor

On Tue, Feb 13, 2018, 10:53 AM Kevin Lam <ke...@fathomhealth.co> wrote:

> Hi all,
>
> My team and I have been experimenting with Airflow and Kubernetes, and
> there has been a lot of activity recently with the Kubernetes Executor so
> hopefully someone can help us out.
>
> Specifically, we are using our own variant of the kubernetes executor to
> run some pods on pre-emptible VMs on GKE (
> https://cloud.google.com/kubernetes-engine/docs/concepts/preemptible-vm),
> and were wondering if anyone had an advice regarding how to handle
> pre-emptions of nodes in a graceful way.
>
> Currently, if a node gets pre-empted and is removed our pod dies causing a
> corresponding airflow task to fail, but in such cases we'd really like the
> pod to be recreated and the task go continue on. At the same time we want
> other 'normal' failures to cause the airflow task to fail.
>
> One idea is to use jobs instead of pods, but if I recall correctly there
> was already a bunch of discussion on this topic for the apache Kube
> Executor, and in the end pods were chosen.
>
> Does anyone have any ideas about how to work with pre-emptible
> VMs+GKE+Airflow? Any help is appreciated!
>
> Thanks,
> Kevin
>

Reply via email to