Thanks, Kirupa, I'll create the JIRA tasks shortly and assign that one to you.
On Tue, Oct 23, 2018 at 5:09 PM Kirupa Devarajan <kirupagara...@gmail.com> wrote: > Hi Yaniv, > > I am happy to pick up the following task > > 1. Add to the JobManager the functionality to read action level > dependencies > > Regards, > Kirupa > > On Tue., 23 Oct. 2018, 11:04 am Yaniv Rodenski, <ya...@shinto.io> wrote: > > > Hi Nadav, > > > > It does make sense, in fact, we actually have action level resources > > already, however they are limited to the configuration files for the > > container. > > I also think that we need to revision the way we set up those. Correctly > we > > use YARN/Mesos to copy dependencies to the containers. With YARN 3.0 I > > think it makes sense to move to use Docker as the way to manage resources > > in the containers. > > This should also have performance benefits + will make life easier (I > hope) > > when we start working on K8s. > > > > To do this, I think we need to add the following tasks: > > 1. Add to the JobManager the functionality to read action level > > dependencies > > 2. Move from Mesos/YARN containers to Docker (probably at least two > tasks) > > > > I'll add them to JIRA asap, for version 0.2.1-incubating if everyone is > OK > > with it. > > > > On Sat, Oct 20, 2018 at 6:43 PM Nadav Har Tzvi <nadavhart...@gmail.com> > > wrote: > > > > > Hey everyone, > > > > > > Yaniv and I were just discussing how to resolve dependencies in the new > > > frameworks architecture and integrate the dependencies with the > concrete > > > cluster resource manager (Mesos/YARN) > > > We rolled with the idea of each runner (or base runner) performing the > > > dependencies resolution on its own. > > > So for example, the Spark Scala runner would resolve the required JARs > > and > > > do whatever it needs to do with them (e.g. spark-submit --jars > --packages > > > --repositories, etc). > > > The base Python provider will resolve dependencies and dynamically > > generate > > > a requirement.txt file that will deployed to the executor. > > > The handling of the requirements.txt file differs between different > > > concrete Python runners. For example, a regular Python runner would > > simply > > > run pip install, while the pyspark runner would need to rearrange the > > > dependencies in a way that would be acceptable by spark-submit ( > > > > > > > > > https://bytes.grubhub.com/managing-dependencies-and-artifacts-in-pyspark-7641aa89ddb7 > > > sounds like a decent idea, comment if you have a better idea please) > > > > > > So far I hope it makes sense. > > > > > > The next item I want to discuss is as follows: > > > In the new architecture, we do hierarchical runtime environment > > resolution, > > > starting at the top job level and drilling down to the action level, > > > outputting one unified environment configuration file that is deployed > to > > > the executor. > > > I suggest doing the same with dependencies. > > > Currently, we only have job level dependencies. I suggest that we > provide > > > action level dependencies and resolve them in exactly the same manner > as > > we > > > resolve the environment. > > > There should be quite a few benefits for this approach: > > > > > > 1. It will give the option to have different versions of the same > > > package in different actions. This is especially important if you > have > > > 2+ > > > pipeline developers working independently, this would reduce the > > > integration costs by letting each action be more self-contained. > > > 2. It should lower the startup time per action. The more > dependencies > > > you have, the longer it takes to resolve and install them. Actions > > will > > > no > > > longer get any unnecessary dependencies. > > > > > > > > > What do you think? Does it make sense? > > > > > > Cheers, > > > Nadav > > > > > > > > > -- > > Yaniv Rodenski > > > > +61 477 778 405 > > ya...@shinto.io > > > -- Yaniv Rodenski +61 477 778 405 ya...@shinto.io