Thanks for digging around here. I think there are a few distinct issues.

1. Properties containing the '=' character need to be escaped.
I was able to load properties fine as long as I escape the '='
character. But maybe we should document this:

== spark-defaults.conf ==
spark.foo a\=B
== shell ==
scala> sc.getConf.get("spark.foo")
res2: String = a=B

2. spark.driver.extraJavaOptions, when set in the properties file,
don't affect the driver when running in client mode (always the case
for mesos). We should probably document this. In this case you need to
either use --driver-java-options or set SPARK_SUBMIT_OPTS.

3. Arguments aren't propagated on Mesos (this might be because of the
other issues, or a separate bug).

- Patrick

On Wed, Jul 30, 2014 at 3:10 PM, Cody Koeninger <c...@koeninger.org> wrote:
> In addition, spark.executor.extraJavaOptions does not seem to behave as I
> would expect; java arguments don't seem to be propagated to executors.
>
>
> $ cat conf/spark-defaults.conf
>
> spark.master
> mesos://zk://etl-01.mxstg:2181,etl-02.mxstg:2181,etl-03.mxstg:2181/masters
> spark.executor.extraJavaOptions -Dfoo.bar.baz=23
> spark.driver.extraJavaOptions -Dfoo.bar.baz=23
>
>
> $ ./bin/spark-shell
>
> scala> sc.getConf.get("spark.executor.extraJavaOptions")
> res0: String = -Dfoo.bar.baz=23
>
> scala> sc.parallelize(1 to 100).map{ i => (
>      |  java.net.InetAddress.getLocalHost.getHostName,
>      |  System.getProperty("foo.bar.baz")
>      | )}.collect
>
> res1: Array[(String, String)] = Array((dn-01.mxstg,null),
> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-01.mxstg,null),
> (dn-01.mxstg,null), (dn-01.mxstg,null), (dn-02.mxstg,null),
> (dn-02.mxstg,null), ...
>
>
>
> Note that this is a mesos deployment, although I wouldn't expect that to
> affect the availability of spark.driver.extraJavaOptions in a local spark
> shell.
>
>
> On Wed, Jul 30, 2014 at 4:18 PM, Cody Koeninger <c...@koeninger.org> wrote:
>
>> Either whitespace or equals sign are valid properties file formats.
>> Here's an example:
>>
>> $ cat conf/spark-defaults.conf
>> spark.driver.extraJavaOptions -Dfoo.bar.baz=23
>>
>> $ ./bin/spark-shell -v
>> Using properties file: /opt/spark/conf/spark-defaults.conf
>> Adding default property: spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
>>
>>
>> scala>  System.getProperty("foo.bar.baz")
>> res0: String = null
>>
>>
>> If you add double quotes, the resulting string value will have double
>> quotes.
>>
>>
>> $ cat conf/spark-defaults.conf
>> spark.driver.extraJavaOptions "-Dfoo.bar.baz=23"
>>
>> $ ./bin/spark-shell -v
>> Using properties file: /opt/spark/conf/spark-defaults.conf
>> Adding default property: spark.driver.extraJavaOptions="-Dfoo.bar.baz=23"
>>
>> scala>  System.getProperty("foo.bar.baz")
>> res0: String = null
>>
>>
>> Neither one of those affects the issue; the underlying problem in my case
>> seems to be that bin/spark-class uses the SPARK_SUBMIT_OPTS and
>> SPARK_JAVA_OPTS environment variables, but nothing parses
>> spark-defaults.conf before the java process is started.
>>
>> Here's an example of the process running when only spark-defaults.conf is
>> being used:
>>
>> $ ps -ef | grep spark
>>
>> 514       5182  2058  0 21:05 pts/2    00:00:00 bash ./bin/spark-shell -v
>>
>> 514       5189  5182  4 21:05 pts/2    00:00:22 /usr/local/java/bin/java
>> -cp
>> ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
>> -XX:MaxPermSize=128m -Djava.library.path= -Xms512m -Xmx512m
>> org.apache.spark.deploy.SparkSubmit spark-shell -v --class
>> org.apache.spark.repl.Main
>>
>>
>> Here's an example of it when the command line --driver-java-options is
>> used (and thus things work):
>>
>>
>> $ ps -ef | grep spark
>> 514       5392  2058  0 21:15 pts/2    00:00:00 bash ./bin/spark-shell -v
>> --driver-java-options -Dfoo.bar.baz=23
>>
>> 514       5399  5392 80 21:15 pts/2    00:00:06 /usr/local/java/bin/java
>> -cp
>> ::/opt/spark/conf:/opt/spark/lib/spark-assembly-1.0.1-hadoop2.3.0-mr1-cdh5.0.2.jar:/etc/hadoop/conf-mx
>> -XX:MaxPermSize=128m -Dfoo.bar.baz=23 -Djava.library.path= -Xms512m
>> -Xmx512m org.apache.spark.deploy.SparkSubmit spark-shell -v
>> --driver-java-options -Dfoo.bar.baz=23 --class org.apache.spark.repl.Main
>>
>>
>>
>>
>> On Wed, Jul 30, 2014 at 3:43 PM, Patrick Wendell <pwend...@gmail.com>
>> wrote:
>>
>>> Cody - in your example you are using the '=' character, but in our
>>> documentation and tests we use a whitespace to separate the key and
>>> value in the defaults file.
>>>
>>> docs: http://spark.apache.org/docs/latest/configuration.html
>>>
>>> spark.driver.extraJavaOptions -Dfoo.bar.baz=23
>>>
>>> I'm not sure if the java properties file parser will try to interpret
>>> the equals sign. If so you might need to do this.
>>>
>>> spark.driver.extraJavaOptions "-Dfoo.bar.baz=23"
>>>
>>> Do those work for you?
>>>
>>> On Wed, Jul 30, 2014 at 1:32 PM, Marcelo Vanzin <van...@cloudera.com>
>>> wrote:
>>> > Hi Cody,
>>> >
>>> > Could you file a bug for this if there isn't one already?
>>> >
>>> > For system properties SparkSubmit should be able to read those
>>> > settings and do the right thing, but that obviously won't work for
>>> > other JVM options... the current code should work fine in cluster mode
>>> > though, since the driver is a different process. :-)
>>> >
>>> >
>>> > On Wed, Jul 30, 2014 at 1:12 PM, Cody Koeninger <c...@koeninger.org>
>>> wrote:
>>> >> We were previously using SPARK_JAVA_OPTS to set java system properties
>>> via
>>> >> -D.
>>> >>
>>> >> This was used for properties that varied on a
>>> per-deployment-environment
>>> >> basis, but needed to be available in the spark shell and workers.
>>> >>
>>> >> On upgrading to 1.0, we saw that SPARK_JAVA_OPTS had been deprecated,
>>> and
>>> >> replaced by spark-defaults.conf and command line arguments to
>>> spark-submit
>>> >> or spark-shell.
>>> >>
>>> >> However, setting spark.driver.extraJavaOptions and
>>> >> spark.executor.extraJavaOptions in spark-defaults.conf is not a
>>> replacement
>>> >> for SPARK_JAVA_OPTS:
>>> >>
>>> >>
>>> >> $ cat conf/spark-defaults.conf
>>> >> spark.driver.extraJavaOptions=-Dfoo.bar.baz=23
>>> >>
>>> >> $ ./bin/spark-shell
>>> >>
>>> >> scala> System.getProperty("foo.bar.baz")
>>> >> res0: String = null
>>> >>
>>> >>
>>> >> $ ./bin/spark-shell --driver-java-options "-Dfoo.bar.baz=23"
>>> >>
>>> >> scala> System.getProperty("foo.bar.baz")
>>> >> res0: String = 23
>>> >>
>>> >>
>>> >> Looking through the shell scripts for spark-submit and spark-class, I
>>> can
>>> >> see why this is; parsing spark-defaults.conf from bash could be
>>> brittle.
>>> >>
>>> >> But from an ergonomic point of view, it's a step back to go from a
>>> >> set-it-and-forget-it configuration in spark-env.sh, to requiring
>>> command
>>> >> line arguments.
>>> >>
>>> >> I can solve this with an ad-hoc script to wrap spark-shell with the
>>> >> appropriate arguments, but I wanted to bring the issue up to see if
>>> anyone
>>> >> else had run into it,
>>> >> or had any direction for a general solution (beyond parsing java
>>> properties
>>> >> files from bash).
>>> >
>>> >
>>> >
>>> > --
>>> > Marcelo
>>>
>>
>>

Reply via email to