Thanks, Kirupa,

I'll create the JIRA tasks shortly and assign that one to you.



On Tue, Oct 23, 2018 at 5:09 PM Kirupa Devarajan <kirupagara...@gmail.com>
wrote:

> Hi Yaniv,
>
> I am happy to pick up the following task
>
> 1. Add to the JobManager the functionality to read action level
> dependencies
>
> Regards,
> Kirupa
>
> On Tue., 23 Oct. 2018, 11:04 am Yaniv Rodenski, <ya...@shinto.io> wrote:
>
> > Hi Nadav,
> >
> > It does make sense, in fact, we actually have action level resources
> > already, however they are limited to the configuration files for the
> > container.
> > I also think that we need to revision the way we set up those. Correctly
> we
> > use YARN/Mesos to copy dependencies to the containers. With YARN 3.0 I
> > think it makes sense to move to use Docker as the way to manage resources
> > in the containers.
> > This should also have performance benefits + will make life easier (I
> hope)
> > when we start working on K8s.
> >
> > To do this, I think we need to add the following tasks:
> > 1. Add to the JobManager the functionality to read action level
> > dependencies
> > 2. Move from Mesos/YARN containers to Docker (probably at least two
> tasks)
> >
> > I'll add them to JIRA asap, for version 0.2.1-incubating if everyone is
> OK
> > with it.
> >
> > On Sat, Oct 20, 2018 at 6:43 PM Nadav Har Tzvi <nadavhart...@gmail.com>
> > wrote:
> >
> > > Hey everyone,
> > >
> > > Yaniv and I were just discussing how to resolve dependencies in the new
> > > frameworks architecture and integrate the dependencies with the
> concrete
> > > cluster resource manager (Mesos/YARN)
> > > We rolled with the idea of each runner (or base runner) performing the
> > > dependencies resolution on its own.
> > > So for example, the Spark Scala runner would resolve the required JARs
> > and
> > > do whatever it needs to do with them (e.g. spark-submit --jars
> --packages
> > > --repositories, etc).
> > > The base Python provider will resolve dependencies and dynamically
> > generate
> > > a requirement.txt file that will deployed to the executor.
> > > The handling of the requirements.txt file differs between different
> > > concrete Python runners. For example, a regular Python runner would
> > simply
> > > run pip install, while the pyspark runner would need to rearrange the
> > > dependencies in a way that would be acceptable by spark-submit (
> > >
> > >
> >
> https://bytes.grubhub.com/managing-dependencies-and-artifacts-in-pyspark-7641aa89ddb7
> > > sounds like a decent idea, comment if you have a better idea please)
> > >
> > > So far I hope it makes sense.
> > >
> > > The next item I want to discuss is as follows:
> > > In the new architecture, we do hierarchical runtime environment
> > resolution,
> > > starting at the top job level and drilling down to the action level,
> > > outputting one unified environment configuration file that is deployed
> to
> > > the executor.
> > > I suggest doing the same with dependencies.
> > > Currently, we only have job level dependencies. I suggest that we
> provide
> > > action level dependencies and resolve them in exactly the same manner
> as
> > we
> > > resolve the environment.
> > > There should be quite a few benefits for this approach:
> > >
> > >    1. It will give the option to have different versions of the same
> > >    package in different actions. This is especially important if you
> have
> > > 2+
> > >    pipeline developers working independently, this would reduce the
> > >    integration costs by letting each action be more self-contained.
> > >    2. It should lower the startup time per action. The more
> dependencies
> > >    you have, the longer it takes to resolve and install them. Actions
> > will
> > > no
> > >    longer get any unnecessary dependencies.
> > >
> > >
> > > What do you think? Does it make sense?
> > >
> > > Cheers,
> > > Nadav
> > >
> >
> >
> > --
> > Yaniv Rodenski
> >
> > +61 477 778 405
> > ya...@shinto.io
> >
>


-- 
Yaniv Rodenski

+61 477 778 405
ya...@shinto.io

Reply via email to