for what it's worth we've been running airflow on ECS for a few years already.
On Wed, Jul 12, 2017 at 12:21 PM, Grant Nicholas < [email protected]> wrote: > Is having a static set of workers necessary? Launching a job on Kubernetes > from a cached docker image takes a few seconds max. I think this is an > acceptable delay for a batch processing system like airflow. > > Additionally, if you dynamically launch workers you can start dynamically > launching *any type* of worker and you don't have to statically allocate > pools of worker types. IE) A single DAG could use a scala docker image to > do spark calculations, a C++ docker image to use some low level numerical > library, and a python docker image by default to do any generic airflow > stuff. Additionally, you can size workers according to their usage. Maybe > the spark driver program only needs a few GBs of RAM but the C++ numerical > library needs many hundreds. > > I agree there is a bit of extra book-keeping that needs to be done, but > the tradeoff is an important one to explicitly make. >
