Re: Spark options

Pat Ferrel Wed, 12 Nov 2014 08:35:19 -0800

yes, the drivers support executor memory directly too.

What was the reason you didn’t want to use the Spark submit process for 
executing drivers? I understand we have to find our jars and setup kryo.


On Nov 11, 2014, at 6:00 PM, Dmitriy Lyubimov <[email protected]> wrote:

which is why i explicitly configure executor memory on the client. Although
even that interpretation  depends on the resource manager A LOT it seems.

On Tue, Nov 11, 2014 at 5:49 PM, Pat Ferrel <[email protected]> wrote:

> The submit code is the only place that documents which are needed by
> clients AFAICT. It is pretty complicated and heavily laden with checks for
> which cluster manager is being used. I’d feel a lot better if we were using
> it. There is no way any of us are going to be able to test on all those
> configurations.
> 
> spark-env.sh is mostly for launching the cluster not the client but there
> seem to be exceptions like executor memory.
> 
> 
> On Nov 11, 2014, at 2:18 PM, Dmitriy Lyubimov <[email protected]> wrote:
> 
> these files if i read it correctly are for spawning yet another process. i
> don't see how it may work for the shell.
> 
> I am also not convinced that spark-env is important for the client.
> 
> 
> On Tue, Nov 11, 2014 at 2:09 PM, Pat Ferrel <[email protected]> wrote:
> 
>> I was thinking -Dx=y too, seems like a good idea.
>> 
>> But we should also support setting them the way Spark documents in
>> spark-env.sh and the two links Andrew found may solve that in a
>> maintainable way. Maybe we get the SparkConf from a new mahoutSparkConf
>> function, which handles all env supplied setup. For the drivers it can be
>> done in the base class allowing and CLI overrides later. Then the
> SparkConf
>> is finally passed in to mahoutSparkContext where as little as possible is
>> changed in the conf.
>> 
>> I’ll look at this for the drivers. Should be easy to add to the shell.
>> 
>> On Nov 11, 2014, at 12:36 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>> 
>> IMO you just need to modify `mahout spark-shell` to propagate -Dx=y
>> parameters to the java startup call and all should be fine.
>> 
>> On Tue, Nov 11, 2014 at 12:23 PM, Andrew Palumbo <[email protected]>
>> wrote:
>> 
>>> 
>>> 
>>> 
>>> I've run into this problem starting $ mahout shell-script.  i.e. needing
>>> to set the spark.kryoserializer.buffer.mb and  spark.akka.frameSize.
>> I've
>>> been temporarily hard coding them for now while developing.
>>> 
>>> I'm just getting familiar with What you've done with the CLI drivers.
>> For
>>> #2 could we borrow option parsing code/methods from spark [1] [2] at
> each
>>> (spark) release and somehow add this to
>>> MahoutOptionParser.parseSparkOptions?
>>> 
>>> I'll hopefully be doing some CLI work soon and have a better
>> understanding.
>>> 
>>> [1]
>>> 
>> 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmitDriverBootstrapper.scala
>>> [2]
>>> 
>> 
> https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala
>>> 
>>>> From: [email protected]
>>>> Subject: Spark options
>>>> Date: Wed, 5 Nov 2014 09:48:59 -0800
>>>> To: [email protected]
>>>> 
>>>> Spark has a launch script as hadoop does. We use the Hadoop launcher
>>> script but not the Spark one. When starting up your Spark cluster there
>> is
>>> a spark-env.sh script that can set a bunch of environment variables. In
>> our
>>> own mahoutSparkContext function, which takes the place of the Spark
>> submit
>>> script and launcher we don’t account for most of the environment
>> variables.
>>>> 
>>>> Unless I missed something this means most of the documented options
> will
>>> be ignored unless a user of Mahout parses and sets them in their own
>>> SparkConf. The Mahout CLI drivers don’t do this for all possible
> options,
>>> only supporting a few like job name and spark.executor.memory.
>>>> 
>>>> The question is how to best handle these Spark options. There seem to
> be
>>> two options:
>>>> 1) use sparks launch mechanism for drivers but allow some to be
>>> overridden in the CLI
>>>> 2) add parsing the env for options and set up the SparkConf default in
>>> mahoutSparkContext with those variables.
>>>> 
>>>> The downside of #2 is that as variables change we’ll have to reflect
>>> those in our code. I forget why #1 is not an option but Dmitriy has been
>>> consistently against this—in any case it would mean a fair bit of
>>> refactoring I believe.
>>>> 
>>>> Any opinions or corrections?
>>> 
>>> 
>>> 
>> 
>> 
> 
>

Re: Spark options

Reply via email to