[ 
https://issues.apache.org/jira/browse/SPARK-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-9672:
-----------------------------------

    Assignee: Apache Spark

> Drivers run in cluster mode on mesos may not have spark-env variables 
> available
> -------------------------------------------------------------------------------
>
>                 Key: SPARK-9672
>                 URL: https://issues.apache.org/jira/browse/SPARK-9672
>             Project: Spark
>          Issue Type: Bug
>          Components: Mesos, Spark Submit
>    Affects Versions: 1.4.1
>         Environment: Ubuntu 14.04
> Mesos 0.23 (compiled from source following instructions on mesos site)
> Spark 1.4 prebuilt for hadoop 2.6
> Test setup was a two node mesos cluster. One dedicated master and one 
> dedicated slave. Spark submissions occurred on the master and were directed 
> at a mesos dispatcher running on the master.
>            Reporter: Patrick Shields
>            Assignee: Apache Spark
>            Priority: Minor
>
> This issue definitely affects Mesos mode, but may effect complex standalone 
> topologies as well.
> When running spark-submit with {noformat}--deploy-mode cluster{noformat} 
> environment variables set in {{spark-env.sh}} that are not prefixed with 
> {{SPARK_}} are not available in the driver process. The behavior I expect is 
> that any variables set in {{spark-env.sh}} are available on the driver and 
> all executors.
> {{spark-env.sh}} is executed by {{load-spark-env.sh}} which uses an 
> environment variable {{SPARK_ENV_LOADED}} 
> [[code|https://github.com/apache/spark/blob/master/bin/load-spark-env.sh#L25]]
>  to ensure that it is only run once. When using the {{RestSubmissionClient}}, 
> spark submit propagates all environment variables that are prefixed with 
> {{SPARK_}} 
> [[code|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala#L400]]
>  to the {{MesosRestServer}} where they are used to initialize the driver 
> [[code|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L155]].
>  During this process, {{SPARK_ENV_LOADED}} is propagated to the new driver 
> process (since running spark submit has caused {{load-spark-env.sh}} to be 
> run on the submitter's machine) 
> [[code|https://github.com/apache/spark/blob/d86bbb4e286f16f77ba125452b07827684eafeed/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L371]].
>  Now when {{load-spark-env.sh}} is called by {{MesosClusterScheduler}} 
> {{SPARK_ENV_LOADED}} is set and {{spark-env.sh}} is never sourced.
> [This gist|https://gist.github.com/pashields/9fe662d6ec5c079bdf70] shows the 
> testing setup I used while investigating this issue. An example invocation 
> looked like {noformat}spark-1.5.0-SNAPSHOT-bin-custom-spark/bin/spark-submit 
> --deploy-mode cluster --master mesos://172.31.34.154:7077 --class Test 
> spark-env-var-test_2.10-0.1-SNAPSHOT.jar{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to