It would be really good if you'd share experiences on how to run this on kubernetes and ECS. I'm not aware of a good guide on how to run this on either for example, but it's a very useful and quick setup to start with, especially combining that with deployment manager and cloudformation (probably).
I'm talking to someone else who's looking at running on kubernetes and potentially opensourcing a generic template for kubernetes deployments. Would it be possible to share your experiences? What tech are you using for specific issues? - how do you deploy and sync dags? Are you using EFS? - how you do build the container with airflow + executables? - where do you send log files or log lines to? - High Availability and how? Really looking forward to how that's done, so we can put this on the wiki. Especially since GCP is now also starting to embrace airflow, it'd be good to have a better understanding how easy and quick it can be to deploy airflow on gcp: https://cloud.google.com/blog/big-data/2017/07/how-to-aggregate-data-for-bigquery-using-apache-airflow Rgds, Gerard On Wed, Jul 12, 2017 at 8:55 PM, Arthur Purvis <[email protected]> wrote: > for what it's worth we've been running airflow on ECS for a few years > already. > > On Wed, Jul 12, 2017 at 12:21 PM, Grant Nicholas < > [email protected]> wrote: > > > Is having a static set of workers necessary? Launching a job on > Kubernetes > > from a cached docker image takes a few seconds max. I think this is an > > acceptable delay for a batch processing system like airflow. > > > > Additionally, if you dynamically launch workers you can start dynamically > > launching *any type* of worker and you don't have to statically allocate > > pools of worker types. IE) A single DAG could use a scala docker image to > > do spark calculations, a C++ docker image to use some low level numerical > > library, and a python docker image by default to do any generic airflow > > stuff. Additionally, you can size workers according to their usage. Maybe > > the spark driver program only needs a few GBs of RAM but the C++ > numerical > > library needs many hundreds. > > > > I agree there is a bit of extra book-keeping that needs to be done, but > > the tradeoff is an important one to explicitly make. > > >
