Hey everyone,

Yaniv and I were just discussing how to resolve dependencies in the new
frameworks architecture and integrate the dependencies with the concrete
cluster resource manager (Mesos/YARN)
We rolled with the idea of each runner (or base runner) performing the
dependencies resolution on its own.
So for example, the Spark Scala runner would resolve the required JARs and
do whatever it needs to do with them (e.g. spark-submit --jars --packages
--repositories, etc).
The base Python provider will resolve dependencies and dynamically generate
a requirement.txt file that will deployed to the executor.
The handling of the requirements.txt file differs between different
concrete Python runners. For example, a regular Python runner would simply
run pip install, while the pyspark runner would need to rearrange the
dependencies in a way that would be acceptable by spark-submit (
https://bytes.grubhub.com/managing-dependencies-and-artifacts-in-pyspark-7641aa89ddb7
sounds like a decent idea, comment if you have a better idea please)

So far I hope it makes sense.

The next item I want to discuss is as follows:
In the new architecture, we do hierarchical runtime environment resolution,
starting at the top job level and drilling down to the action level,
outputting one unified environment configuration file that is deployed to
the executor.
I suggest doing the same with dependencies.
Currently, we only have job level dependencies. I suggest that we provide
action level dependencies and resolve them in exactly the same manner as we
resolve the environment.
There should be quite a few benefits for this approach:

   1. It will give the option to have different versions of the same
   package in different actions. This is especially important if you have 2+
   pipeline developers working independently, this would reduce the
   integration costs by letting each action be more self-contained.
   2. It should lower the startup time per action. The more dependencies
   you have, the longer it takes to resolve and install them. Actions will no
   longer get any unnecessary dependencies.


What do you think? Does it make sense?

Cheers,
Nadav

Reply via email to