RE: Spark config option 'expression language' feedback request

Dale Richardson Fri, 13 Mar 2015 15:54:07 -0700

Hi Reynold,They are some very good questions.
Re: Known libraries
There are a number of well known libraries that we could use to implement this 
features, including MVEL, OGNL and JBOSS EL, or even Spring's EL.I looked at 
using them to prototype this feature in the beginning, but they all ended up 
bringing in a lot of code to service a pretty small functional requirement.The 
prime requirement I was trying to meet was:
1. Be able to specify quantities in kb,mb,gb etc transparently.2. Be able to 
specify some options as fractions of system attributes eg cpuCores * 0.8
By just implementing this functionality and nothing else I figured I was 
constraining things enough that end-users got useful functionality but not 
enough functionality to shoot themselves in the foot in new and interesting 
ways. I couldn't see a nice way of limiting the expressiveness of 3rd party 
libraries to this extent.
I'd be happy to re-look at the feasibility of pulling in one of the 3rd party 
libraries if you think this approach has more merit, but I do caution that we 
may be opening a Pandora's box of potential functionality.  Those 3rd party 
libraries have a lot of (potentially excess) functionality in them.
Re: Code ComplexityI wrote the bare minimum code I could come up with to 
service the above mentioned functionality, and then refactored it to use a 
stacked traits pattern which increased the code size by about a further 30%.  
The expression code as it stands is pretty minimal, and has more then 120 unit 
tests proving its functionality. More then half the code that is there is taken 
up by utility classes to allow easy reference to byte quantities and time 
units. The design was deliberately limited to meeting the above requirements 
and not much more to reduce the chance for other subtleties to raise their 
heads. 
Re: Work arounds.It would be pretty simple to implement fall back functionality 
to disable expression parsing by:1. Globally having a configuration option to 
disable all expression parsing and fall back to simple java property parsing.2. 
Locally having a known prefix that disables expression parsing for that 
option.This should give enough workarounds to keep things running in the 
unlikely event that something crops up no matter what happens.
Re: Error messagesIn regards to your comment about nice error messages I would 
have to agree with you, it would have been nice.  In the end I just return an 
option[Double] to the calling code for the parsed expression if the entire 
string is parsed correctly. Given the additional complexity adding error 
messages involved I retrospectively justify this by saying how much info do you 
need debug an expression like 'cpuCores * 0.8'? :)
Thanks for the feedback.
Regards,Dale.
> From: r...@databricks.com
> Date: Fri, 13 Mar 2015 11:26:44 -0700
> Subject: Re: Spark config option 'expression language' feedback request
> To: dale...@hotmail.com
> CC: dev@spark.apache.org
> 
> This is an interesting idea.
> 
> Are there well known libraries for doing this? Config is the one place
> where it would be great to have something ridiculously simple, so it is
> more or less bug free. I'm concerned about the complexity in this patch and
> subtle bugs that it might introduce to config options that users will have
> no workarounds. Also I believe it is fairly hard for nice error messages to
> propagate when using Scala's parser combinator.
> 
> 
> On Fri, Mar 13, 2015 at 3:07 AM, Dale Richardson <dale...@hotmail.com>
> wrote:
> 
> >
> > PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to
> > allow for Spark configuration options (whether on command line, environment
> > variable or a configuration file) to be specified via a simple expression
> > language.
> >
> >
> > Such a feature has the following end-user benefits:
> > - Allows for the flexibility in specifying time intervals or byte
> > quantities in appropriate and easy to follow units e.g. 1 week rather
> > rather then 604800 seconds
> >
> > - Allows for the scaling of a configuration option in relation to a system
> > attributes. e.g.
> >
> > SPARK_WORKER_CORES = numCores - 1
> >
> > SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB
> >
> > - Gives the ability to scale multiple configuration options together eg:
> >
> > spark.driver.memory = 0.75 * physicalMemoryBytes
> >
> > spark.driver.maxResultSize = spark.driver.memory * 0.8
> >
> >
> > The following functions are currently supported by this PR:
> > NumCores:             Number of cores assigned to the JVM (usually ==
> > Physical machine cores)
> > PhysicalMemoryBytes:  Memory size of hosting machine
> >
> > JVMTotalMemoryBytes:  Current bytes of memory allocated to the JVM
> >
> > JVMMaxMemoryBytes:    Maximum number of bytes of memory available to the
> > JVM
> >
> > JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes
> >
> >
> > I was wondering if anybody on the mailing list has any further ideas on
> > other functions that could be useful to have when specifying spark
> > configuration options?
> > Regards,Dale.
> >
RE: Spark config option 'expression language' feedback request

Reply via email to