[
https://issues.apache.org/jira/browse/SPARK-9672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-9672:
-----------------------------------
Assignee: (was: Apache Spark)
> Drivers run in cluster mode on mesos may not have spark-env variables
> available
> -------------------------------------------------------------------------------
>
> Key: SPARK-9672
> URL: https://issues.apache.org/jira/browse/SPARK-9672
> Project: Spark
> Issue Type: Bug
> Components: Mesos, Spark Submit
> Affects Versions: 1.4.1
> Environment: Ubuntu 14.04
> Mesos 0.23 (compiled from source following instructions on mesos site)
> Spark 1.4 prebuilt for hadoop 2.6
> Test setup was a two node mesos cluster. One dedicated master and one
> dedicated slave. Spark submissions occurred on the master and were directed
> at a mesos dispatcher running on the master.
> Reporter: Patrick Shields
> Priority: Minor
>
> This issue definitely affects Mesos mode, but may effect complex standalone
> topologies as well.
> When running spark-submit with {noformat}--deploy-mode cluster{noformat}
> environment variables set in {{spark-env.sh}} that are not prefixed with
> {{SPARK_}} are not available in the driver process. The behavior I expect is
> that any variables set in {{spark-env.sh}} are available on the driver and
> all executors.
> {{spark-env.sh}} is executed by {{load-spark-env.sh}} which uses an
> environment variable {{SPARK_ENV_LOADED}}
> [[code|https://github.com/apache/spark/blob/master/bin/load-spark-env.sh#L25]]
> to ensure that it is only run once. When using the {{RestSubmissionClient}},
> spark submit propagates all environment variables that are prefixed with
> {{SPARK_}}
> [[code|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/rest/RestSubmissionClient.scala#L400]]
> to the {{MesosRestServer}} where they are used to initialize the driver
> [[code|https://github.com/apache/spark/blob/3c0156899dc1ec1f7dfe6d7c8af47fa6dc7d00bf/core/src/main/scala/org/apache/spark/deploy/rest/StandaloneRestServer.scala#L155]].
> During this process, {{SPARK_ENV_LOADED}} is propagated to the new driver
> process (since running spark submit has caused {{load-spark-env.sh}} to be
> run on the submitter's machine)
> [[code|https://github.com/apache/spark/blob/d86bbb4e286f16f77ba125452b07827684eafeed/core/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosClusterScheduler.scala#L371]].
> Now when {{load-spark-env.sh}} is called by {{MesosClusterScheduler}}
> {{SPARK_ENV_LOADED}} is set and {{spark-env.sh}} is never sourced.
> [This gist|https://gist.github.com/pashields/9fe662d6ec5c079bdf70] shows the
> testing setup I used while investigating this issue. An example invocation
> looked like {noformat}spark-1.5.0-SNAPSHOT-bin-custom-spark/bin/spark-submit
> --deploy-mode cluster --master mesos://172.31.34.154:7077 --class Test
> spark-env-var-test_2.10-0.1-SNAPSHOT.jar{noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]