Is having a static set of workers necessary? Launching a job on Kubernetes from 
a cached docker image takes a few seconds max. I think this is an acceptable 
delay for a batch processing system like airflow. 

Additionally, if you dynamically launch workers you can start dynamically 
launching *any type* of worker and you don't have to statically allocate pools 
of worker types. IE) A single DAG could use a scala docker image to do spark 
calculations, a C++ docker image to use some low level numerical library,  and 
a python docker image by default to do any generic airflow stuff. Additionally, 
you can size workers according to their usage. Maybe the spark driver program 
only needs a few GBs of RAM but the C++ numerical library needs many hundreds. 

I agree there is a bit of extra book-keeping that needs to be done, but the 
tradeoff is an important one to explicitly make. 

Reply via email to