Re: Dag storage in airflow on kube

Ry Walker Wed, 08 May 2019 15:49:55 -0700

We're fan of (C) because:

* Sometimes, new DAGs introduce new python or system-level dependencies. It 
seems cleaner to update everything together. It's a big hammer, but we've got 
graceful restart of workers built into our platform. This allows us to not have 
to worry about taking the system offline to update dependencies.

* It also provides a way to roll back to previous state in the case of a bad 
upgrade (i.e. roll back both DAGs and dependencies).

* With the emergence of KubernetesExecutor, bundling up the whole image just 
feels cleaner as the system would otherwise be pulling down DAGs repetitively 
with every task execution.

We use an internal DockerRegistry, our GraphQL API, and K8s API to handle 
re-deployments. We also have a CLI to build and run the image locally and to 
deploy code in a more ad-hoc fashion (for example, to deploy to a test Airflow 
cluster).

More info on the stack: https://www.astronomer.io/docs/ee-overview ( 
https://www.astronomer.io/docs/ee-overview/ )

-Ry

Sent via Superhuman ( https://sprh.mn/[email protected] )

On Wed, May 08, 2019 at 4:32 PM, Ashwin Sai Shankar < 
[email protected] > wrote:

> 
> 
> 
> Hi!
> In airflow on kube deployment, which of the following options would you
> recommend to store dags in a production env and why? This is for about
> 1000 dags and everyday ~20 commits are made to dags folder
> a)EFS/NFS volume
> b) git sync (every container is going to do a git pull before running
> task)
> c) bake dags into image
> d)s3-sync
> 
> 
> 
> Thanks,
> Ash
> 
> 
>

Re: Dag storage in airflow on kube

Reply via email to