Run them on different workers by using queues? That way different workers can have different 3rd party libs while sharing the same af core.
B Sent from a device with less than stellar autocorrect > On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <den...@dennisobrien.net> wrote: > > Hi All, > > I have a number of jobs that use scikit-learn for scoring players. > Occasionally I need to upgrade scikit-learn to take advantage of some new > features. We have a single conda environment that specifies all the > dependencies for Airflow as well as for all of our DAGs. So currently > upgrading scikit-learn means upgrading it for all DAGs that use it, and > retraining all models for that version. It becomes a very involved task > and I'm hoping to find a better way. > > One option is to use BashOperator (or something that wraps BashOperator) > and have bash use a specific conda environment with that version of > scikit-learn. While simple, I don't like the idea of limiting task input > to the command line. Still, an option. > > Another option is the DockerOperator. But when I asked around at a > previous Airflow Meetup, I couldn't find anyone actually using it. It also > adds some complexity to the build and deploy process in that now I have to > maintain docker images for all my environments. Still, not ruling it out. > > And the last option I can think of is just heterogeneous workers. We are > migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on > having support for separate worker clusters, so this could include workers > with different conda environments. I assume as long as a few key packages > are identical between scheduler and worker instances (airflow, redis, > celery?) the rest can be whatever. > > Has anyone faced this problem and have some advice? Am I missing any > simpler options? Any thoughts much appreciated. > > thanks, > Dennis