That makes a lot of sense. I can see other runners following suit where there is a packaged up version for different scenarios / backend cluster runtimes.
Should this be part of Apache Beam as a separate maven module or another sub-module inside of Apache Beam, or something else? On Thu, Jul 7, 2016 at 1:49 PM, Amit Sela <[email protected]> wrote: > Hi everyone, > > Lately I've encountered a number of issues concerning the fact that the > Spark runner does not package Spark along with it and forcing people to do > this on their own. > In addition, this seems to get in the way of having beam-examples executed > against the Spark runner, again because it would have to add Spark > dependencies. > > When running on a cluster (which I guess was the original goal here), it is > recommended to have Spark provided by the cluster - this makes sense for > Spark clusters and more so for Spark + YARN clusters where you might have > your Spark built against a specific Hadoop version or using a vendor > distribution. > > In order to make the runner more accessible to new adopters, I suggest to > consider releasing a "spark-included" artifact as well. > > Thoughts ? > > Thanks, > Amit >
