Re: [DISCUSS] Dependencies resolution and action level dependencies

Kirupa Devarajan Mon, 22 Oct 2018 23:10:29 -0700

Hi Yaniv,

I am happy to pick up the following task


1. Add to the JobManager the functionality to read action level dependencies

Regards,
Kirupa

On Tue., 23 Oct. 2018, 11:04 am Yaniv Rodenski, <ya...@shinto.io> wrote:

> Hi Nadav,
>
> It does make sense, in fact, we actually have action level resources
> already, however they are limited to the configuration files for the
> container.
> I also think that we need to revision the way we set up those. Correctly we
> use YARN/Mesos to copy dependencies to the containers. With YARN 3.0 I
> think it makes sense to move to use Docker as the way to manage resources
> in the containers.
> This should also have performance benefits + will make life easier (I hope)
> when we start working on K8s.
>
> To do this, I think we need to add the following tasks:
> 1. Add to the JobManager the functionality to read action level
> dependencies
> 2. Move from Mesos/YARN containers to Docker (probably at least two tasks)
>
> I'll add them to JIRA asap, for version 0.2.1-incubating if everyone is OK
> with it.
>
> On Sat, Oct 20, 2018 at 6:43 PM Nadav Har Tzvi <nadavhart...@gmail.com>
> wrote:
>
> > Hey everyone,
> >
> > Yaniv and I were just discussing how to resolve dependencies in the new
> > frameworks architecture and integrate the dependencies with the concrete
> > cluster resource manager (Mesos/YARN)
> > We rolled with the idea of each runner (or base runner) performing the
> > dependencies resolution on its own.
> > So for example, the Spark Scala runner would resolve the required JARs
> and
> > do whatever it needs to do with them (e.g. spark-submit --jars --packages
> > --repositories, etc).
> > The base Python provider will resolve dependencies and dynamically
> generate
> > a requirement.txt file that will deployed to the executor.
> > The handling of the requirements.txt file differs between different
> > concrete Python runners. For example, a regular Python runner would
> simply
> > run pip install, while the pyspark runner would need to rearrange the
> > dependencies in a way that would be acceptable by spark-submit (
> >
> >
> https://bytes.grubhub.com/managing-dependencies-and-artifacts-in-pyspark-7641aa89ddb7
> > sounds like a decent idea, comment if you have a better idea please)
> >
> > So far I hope it makes sense.
> >
> > The next item I want to discuss is as follows:
> > In the new architecture, we do hierarchical runtime environment
> resolution,
> > starting at the top job level and drilling down to the action level,
> > outputting one unified environment configuration file that is deployed to
> > the executor.
> > I suggest doing the same with dependencies.
> > Currently, we only have job level dependencies. I suggest that we provide
> > action level dependencies and resolve them in exactly the same manner as
> we
> > resolve the environment.
> > There should be quite a few benefits for this approach:
> >
> >    1. It will give the option to have different versions of the same
> >    package in different actions. This is especially important if you have
> > 2+
> >    pipeline developers working independently, this would reduce the
> >    integration costs by letting each action be more self-contained.
> >    2. It should lower the startup time per action. The more dependencies
> >    you have, the longer it takes to resolve and install them. Actions
> will
> > no
> >    longer get any unnecessary dependencies.
> >
> >
> > What do you think? Does it make sense?
> >
> > Cheers,
> > Nadav
> >
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> ya...@shinto.io
>

Re: [DISCUSS] Dependencies resolution and action level dependencies

Reply via email to