Run them on different workers by using queues?
That way different workers can have different 3rd party libs while sharing the 
same af core.

B

Sent from a device with less than stellar autocorrect

> On Jan 30, 2018, at 9:13 AM, Dennis O'Brien <den...@dennisobrien.net> wrote:
> 
> Hi All,
> 
> I have a number of jobs that use scikit-learn for scoring players.
> Occasionally I need to upgrade scikit-learn to take advantage of some new
> features.  We have a single conda environment that specifies all the
> dependencies for Airflow as well as for all of our DAGs.  So currently
> upgrading scikit-learn means upgrading it for all DAGs that use it, and
> retraining all models for that version.  It becomes a very involved task
> and I'm hoping to find a better way.
> 
> One option is to use BashOperator (or something that wraps BashOperator)
> and have bash use a specific conda environment with that version of
> scikit-learn.  While simple, I don't like the idea of limiting task input
> to the command line.  Still, an option.
> 
> Another option is the DockerOperator.  But when I asked around at a
> previous Airflow Meetup, I couldn't find anyone actually using it.  It also
> adds some complexity to the build and deploy process in that now I have to
> maintain docker images for all my environments.  Still, not ruling it out.
> 
> And the last option I can think of is just heterogeneous workers.  We are
> migrating our Airflow infrastructure to AWS ECS (from EC2) and plan on
> having support for separate worker clusters, so this could include workers
> with different conda environments.  I assume as long as a few key packages
> are identical between scheduler and worker instances (airflow, redis,
> celery?) the rest can be whatever.
> 
> Has anyone faced this problem and have some advice?  Am I missing any
> simpler options?  Any thoughts much appreciated.
> 
> thanks,
> Dennis

Reply via email to