yes, the drivers support executor memory directly too. What was the reason you didn’t want to use the Spark submit process for executing drivers? I understand we have to find our jars and setup kryo.
On Nov 11, 2014, at 6:00 PM, Dmitriy Lyubimov <[email protected]> wrote: which is why i explicitly configure executor memory on the client. Although even that interpretation depends on the resource manager A LOT it seems. On Tue, Nov 11, 2014 at 5:49 PM, Pat Ferrel <[email protected]> wrote: > The submit code is the only place that documents which are needed by > clients AFAICT. It is pretty complicated and heavily laden with checks for > which cluster manager is being used. I’d feel a lot better if we were using > it. There is no way any of us are going to be able to test on all those > configurations. > > spark-env.sh is mostly for launching the cluster not the client but there > seem to be exceptions like executor memory. > > > On Nov 11, 2014, at 2:18 PM, Dmitriy Lyubimov <[email protected]> wrote: > > these files if i read it correctly are for spawning yet another process. i > don't see how it may work for the shell. > > I am also not convinced that spark-env is important for the client. > > > On Tue, Nov 11, 2014 at 2:09 PM, Pat Ferrel <[email protected]> wrote: > >> I was thinking -Dx=y too, seems like a good idea. >> >> But we should also support setting them the way Spark documents in >> spark-env.sh and the two links Andrew found may solve that in a >> maintainable way. Maybe we get the SparkConf from a new mahoutSparkConf >> function, which handles all env supplied setup. For the drivers it can be >> done in the base class allowing and CLI overrides later. Then the > SparkConf >> is finally passed in to mahoutSparkContext where as little as possible is >> changed in the conf. >> >> I’ll look at this for the drivers. Should be easy to add to the shell. >> >> On Nov 11, 2014, at 12:36 PM, Dmitriy Lyubimov <[email protected]> > wrote: >> >> IMO you just need to modify `mahout spark-shell` to propagate -Dx=y >> parameters to the java startup call and all should be fine. >> >> On Tue, Nov 11, 2014 at 12:23 PM, Andrew Palumbo <[email protected]> >> wrote: >> >>> >>> >>> >>> I've run into this problem starting $ mahout shell-script. i.e. needing >>> to set the spark.kryoserializer.buffer.mb and spark.akka.frameSize. >> I've >>> been temporarily hard coding them for now while developing. >>> >>> I'm just getting familiar with What you've done with the CLI drivers. >> For >>> #2 could we borrow option parsing code/methods from spark [1] [2] at > each >>> (spark) release and somehow add this to >>> MahoutOptionParser.parseSparkOptions? >>> >>> I'll hopefully be doing some CLI work soon and have a better >> understanding. >>> >>> [1] >>> >> > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala >>> [2] >>> >> > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala >>> >>>> From: [email protected] >>>> Subject: Spark options >>>> Date: Wed, 5 Nov 2014 09:48:59 -0800 >>>> To: [email protected] >>>> >>>> Spark has a launch script as hadoop does. We use the Hadoop launcher >>> script but not the Spark one. When starting up your Spark cluster there >> is >>> a spark-env.sh script that can set a bunch of environment variables. In >> our >>> own mahoutSparkContext function, which takes the place of the Spark >> submit >>> script and launcher we don’t account for most of the environment >> variables. >>>> >>>> Unless I missed something this means most of the documented options > will >>> be ignored unless a user of Mahout parses and sets them in their own >>> SparkConf. The Mahout CLI drivers don’t do this for all possible > options, >>> only supporting a few like job name and spark.executor.memory. >>>> >>>> The question is how to best handle these Spark options. There seem to > be >>> two options: >>>> 1) use sparks launch mechanism for drivers but allow some to be >>> overridden in the CLI >>>> 2) add parsing the env for options and set up the SparkConf default in >>> mahoutSparkContext with those variables. >>>> >>>> The downside of #2 is that as variables change we’ll have to reflect >>> those in our code. I forget why #1 is not an option but Dmitriy has been >>> consistently against this—in any case it would mean a fair bit of >>> refactoring I believe. >>>> >>>> Any opinions or corrections? >>> >>> >>> >> >> > >
