I think it is good to start simple. I would start out using a single machine, use local executor, running on docker, using docker compose, with the puckel image: https://github.com/puckel/docker-airflow (or your own customization thereof).
I would use a cloud database running postgres, e.g. on RDS or equivalent. This way you can keep the same database server if you switch to k8 or otherwise change your infra. After that, you can experiment with switching to celery. Going through the exercise of setting up a simple instance with puckel will better prepare you for getting k8s spun up. >From there, as need dictates, you can try out k8s. It won't be as bad having done things the simple way first. But with k8s you have more complexity to manage and more problems to solve. E.g. with a single machine docker-compose setup, you can bind mount your code and to deploy code, you just ssh into your machine and run git pull. But with k8s, you either have to bake your dags into your image, or set up git sync, etc etc... When you are ready to give k8s a spin, I think a good place to start is the official airflow helm chart https://github.com/helm/charts/tree/master/stable/airflow. On Sun, Mar 22, 2020 at 10:56 AM Yair Taito <[email protected]> wrote: > Hi, > > My name is Yair, I'm working at Matrix BI in Israel as an Big Data > architect. > I tried to integrate Kubernetes and Airflow following the blog > https://kubernetes.io/blog/2018/06/28/airflow-on-kubernetes-part-1-a-different-kind-of-operator/ > . > After a lot of efforts, I think the repository > https://github.com/apache/incubator-airflow.git is not corresponding to > the blog. > For example: the folder "scripts/ci/kubernetes/kube" doesn't exists in the > repository.. > > Can you please guide me? > > Another question: what is the difference between running Airflow in a > standalone machine or inside Kubernetes as a containers? What is the right > configuration? > > Thanks a lot, > Yair >
