Re: Spark config option 'expression language' feedback request

Mike Hynes Tue, 31 Mar 2015 06:08:08 -0700

Hi,
This is just a thought from my experience setting up Spark to run on a
linux cluster. I found it a bit unusual that some parameters could be
specified as command line args to spark-submit, others as env variables,
and some in a configuration file. What I ended up doing was writing my own
bash script that exported all the variables and other scripts to call
spark-submit with the arguments I wanted.


I think that the "expressive language" idea would be doable by using an
entirely env variable based approach, or as commandline parameters. That
way there is only one configuration, which is easily scriptable,  and you
are still able to express relations like:
spark.driver.maxResultSize = spark.driver.memory * 0.8
in your config as
export SPARK_DRIVER_MAXRESULTSIZE = $(bc -l <<< "0.8 *
$SPARK_DRIVER_MEMORY")

It may not look as nice, but it does allow for everything to be in one
place, and to have separate config files for certain jobs. Admittedly, if
you want something like 0.8 * 2G, you first write a bash function to expand
all the "G M k" symbols,  but that's not too painful.
On Mar 31, 2015 2:39 AM, "Reynold Xin" <[email protected]> wrote:

> Reviving this to see if others would like to chime in about this
> "expression language" for config options.
>
>
> On Fri, Mar 13, 2015 at 7:57 PM, Dale Richardson <[email protected]>
> wrote:
>
> > Mridul,I may have added some confusion by giving examples in completely
> > different areas. For example the number of cores available for tasking on
> > each worker machine is a resource-controller level configuration
> variable.
> > In standalone mode (ie using Spark's home-grown resource manager) the
> > configuration variable SPARK_WORKER_CORES is an item that spark admins
> can
> > set (and we can use expressions for). The equivalent variable for YARN
> > (Yarn.nodemanager.resource.cpu-vcores) is only used by Yarn's node
> manager
> > setup and is set by Yarn administrators and outside of control of spark
> > (and most users).  If you are not a cluster administrator then both
> > variables are irrelevant to you. The same goes for SPARK_WORKER_MEMORY.
> >
> > As for spark.executor.memory,  As there is no way to know the attributes
> > of a machine before a task is allocated to it, we cannot use any of the
> > JVMInfo functions. For options like that the expression parser can easily
> > be limited to supporting different byte units of scale (kb/mb/gb etc) and
> > other configuration variables only.
> > Regards,Dale.
> >
> >
> >
> >
> > > Date: Fri, 13 Mar 2015 17:30:51 -0700
> > > Subject: Re: Spark config option 'expression language' feedback request
> > > From: [email protected]
> > > To: [email protected]
> > > CC: [email protected]
> > >
> > > Let me try to rephrase my query.
> > > How can a user specify, for example, what the executor memory should
> > > be or number of cores should be.
> > >
> > > I dont want a situation where some variables can be specified using
> > > one set of idioms (from this PR for example) and another set cannot
> > > be.
> > >
> > >
> > > Regards,
> > > Mridul
> > >
> > >
> > >
> > >
> > > On Fri, Mar 13, 2015 at 4:06 PM, Dale Richardson <[email protected]>
> > wrote:
> > > >
> > > >
> > > >
> > > > Thanks for your questions Mridul.
> > > > I assume you are referring to how the functionality to query system
> > state works in Yarn and Mesos?
> > > > The API's used are the standard JVM API's so the functionality will
> > work without change. There is no real use case for using
> > 'physicalMemoryBytes' in these cases though, as the JVM size has already
> > been limited by the resource manager.
> > > > Regards,Dale.
> > > >> Date: Fri, 13 Mar 2015 08:20:33 -0700
> > > >> Subject: Re: Spark config option 'expression language' feedback
> > request
> > > >> From: [email protected]
> > > >> To: [email protected]
> > > >> CC: [email protected]
> > > >>
> > > >> I am curious how you are going to support these over mesos and yarn.
> > > >> Any configure change like this should be applicable to all of them,
> > not
> > > >> just local and standalone modes.
> > > >>
> > > >> Regards
> > > >> Mridul
> > > >>
> > > >> On Friday, March 13, 2015, Dale Richardson <[email protected]>
> > wrote:
> > > >>
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature
> > to
> > > >> > allow for Spark configuration options (whether on command line,
> > environment
> > > >> > variable or a configuration file) to be specified via a simple
> > expression
> > > >> > language.
> > > >> >
> > > >> >
> > > >> > Such a feature has the following end-user benefits:
> > > >> > - Allows for the flexibility in specifying time intervals or byte
> > > >> > quantities in appropriate and easy to follow units e.g. 1 week
> > rather
> > > >> > rather then 604800 seconds
> > > >> >
> > > >> > - Allows for the scaling of a configuration option in relation to
> a
> > system
> > > >> > attributes. e.g.
> > > >> >
> > > >> > SPARK_WORKER_CORES = numCores - 1
> > > >> >
> > > >> > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB
> > > >> >
> > > >> > - Gives the ability to scale multiple configuration options
> > together eg:
> > > >> >
> > > >> > spark.driver.memory = 0.75 * physicalMemoryBytes
> > > >> >
> > > >> > spark.driver.maxResultSize = spark.driver.memory * 0.8
> > > >> >
> > > >> >
> > > >> > The following functions are currently supported by this PR:
> > > >> > NumCores:             Number of cores assigned to the JVM (usually
> > ==
> > > >> > Physical machine cores)
> > > >> > PhysicalMemoryBytes:  Memory size of hosting machine
> > > >> >
> > > >> > JVMTotalMemoryBytes:  Current bytes of memory allocated to the JVM
> > > >> >
> > > >> > JVMMaxMemoryBytes:    Maximum number of bytes of memory available
> > to the
> > > >> > JVM
> > > >> >
> > > >> > JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes
> > > >> >
> > > >> >
> > > >> > I was wondering if anybody on the mailing list has any further
> > ideas on
> > > >> > other functions that could be useful to have when specifying spark
> > > >> > configuration options?
> > > >> > Regards,Dale.
> > > >> >
> > > >
> > > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [email protected]
> > > For additional commands, e-mail: [email protected]
> > >
> >
> >
>

Re: Spark config option 'expression language' feedback request

Reply via email to