I think it is good to start simple.

I would start out using a single machine, use local executor, running on
docker, using docker compose, with the puckel image:
https://github.com/puckel/docker-airflow (or your own customization
thereof).

I would use a cloud database running postgres, e.g. on RDS or equivalent.
This way you can keep the same database server if you switch to k8 or
otherwise change your infra.

After that, you can experiment with switching to celery.

Going through the exercise of setting up a simple instance with puckel will
better prepare you for getting k8s spun up.

>From there, as need dictates, you can try out k8s.  It won't be as bad
having done things the simple way first.  But with k8s you have more
complexity to manage and more problems to solve.  E.g. with a single
machine docker-compose setup, you can bind mount your code and to deploy
code, you just ssh into your machine and run git pull.  But with k8s, you
either have to bake your dags into your image, or set up git sync, etc
etc...

When you are ready to give k8s a spin, I think a good place to start is the
official airflow helm chart
https://github.com/helm/charts/tree/master/stable/airflow.


On Sun, Mar 22, 2020 at 10:56 AM Yair Taito <[email protected]> wrote:

> Hi,
>
> My name is Yair, I'm working at Matrix BI in Israel as an Big Data
> architect.
> I tried to integrate Kubernetes and Airflow following the blog
> https://kubernetes.io/blog/2018/06/28/airflow-on-kubernetes-part-1-a-different-kind-of-operator/
> .
> After a lot of efforts, I think the repository
> https://github.com/apache/incubator-airflow.git is not corresponding to
> the blog.
> For example: the folder "scripts/ci/kubernetes/kube" doesn't exists in the
> repository..
>
> Can you please guide me?
>
> Another question: what is the difference between running Airflow in a
> standalone machine or inside Kubernetes as a containers? What is the right
> configuration?
>
> Thanks a lot,
> Yair
>

Reply via email to