Is having a static set of workers necessary? Launching a job on Kubernetes from a cached docker image takes a few seconds max. I think this is an acceptable delay for a batch processing system like airflow.
Additionally, if you dynamically launch workers you can start dynamically launching *any type* of worker and you don't have to statically allocate pools of worker types. IE) A single DAG could use a scala docker image to do spark calculations, a C++ docker image to use some low level numerical library, and a python docker image by default to do any generic airflow stuff. Additionally, you can size workers according to their usage. Maybe the spark driver program only needs a few GBs of RAM but the C++ numerical library needs many hundreds. I agree there is a bit of extra book-keeping that needs to be done, but the tradeoff is an important one to explicitly make.
