git commit: SPARK-1680: use configs for specifying environment variables on YARN

tgraves Tue, 05 Aug 2014 13:58:18 -0700

Repository: spark
Updated Branches:
  refs/heads/branch-1.1 46b698307 -> 7b798e10e



SPARK-1680: use configs for specifying environment variables on YARN

Note that this also documents spark.executorEnv.*  which to me means its 
public.  If we don't want that please speak up.

Author: Thomas Graves <tgra...@apache.org>

Closes #1512 from tgravescs/SPARK-1680 and squashes the following commits:

11525df [Thomas Graves] more doc changes
553bad0 [Thomas Graves] fix documentation
152bf7c [Thomas Graves] fix docs
5382326 [Thomas Graves] try fix docs
32f86a4 [Thomas Graves] use configs for specifying environment variables on YARN

(cherry picked from commit 41e0a21b22ccd2788dc079790788e505b0d4e37d)
Signed-off-by: Thomas Graves <tgra...@apache.org>


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b798e10
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7b798e10
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/7b798e10

Branch: refs/heads/branch-1.1
Commit: 7b798e10e214cd407d3399e2cab9e3789f9a929e
Parents: 46b6983
Author: Thomas Graves <tgra...@apache.org>
Authored: Tue Aug 5 15:57:32 2014 -0500
Committer: Thomas Graves <tgra...@apache.org>
Committed: Tue Aug 5 15:57:42 2014 -0500

----------------------------------------------------------------------
 docs/configuration.md                           |  8 +++++++
 docs/running-on-yarn.md                         | 22 +++++++++++++++-----
 .../apache/spark/deploy/yarn/ClientBase.scala   | 13 ++++++++++++
 .../deploy/yarn/ExecutorRunnableUtil.scala      |  6 +++++-
 4 files changed, 43 insertions(+), 6 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/7b798e10/docs/configuration.md
----------------------------------------------------------------------
diff --git a/docs/configuration.md b/docs/configuration.md
index 1333465..6ae453d 100644
--- a/docs/configuration.md
+++ b/docs/configuration.md
@@ -206,6 +206,14 @@ Apart from these, the following properties are also 
available, and may be useful
     used during aggregation goes above this amount, it will spill the data 
into disks.
   </td>
 </tr>
+<tr>
+  <td><code>spark.executorEnv.[EnvironmentVariableName]</code></td>
+  <td>(none)</td>
+  <td>
+    Add the environment variable specified by 
<code>EnvironmentVariableName</code> to the Executor 
+    process. The user can specify multiple of these and to set multiple 
environment variables. 
+  </td>
+</tr>
 </table>
 
 #### Shuffle Behavior

http://git-wip-us.apache.org/repos/asf/spark/blob/7b798e10/docs/running-on-yarn.md
----------------------------------------------------------------------
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 573930d..9bc20db 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -17,10 +17,6 @@ To build Spark yourself, refer to the [building with Maven 
guide](building-with-
 
 Most of the configs are the same for Spark on YARN as for other deployment 
modes. See the [configuration page](configuration.html) for more information on 
those.  These are configs that are specific to Spark on YARN.
 
-#### Environment Variables
-
-* `SPARK_YARN_USER_ENV`, to add environment variables to the Spark processes 
launched on YARN. This can be a comma separated list of environment variables, 
e.g. `SPARK_YARN_USER_ENV="JAVA_HOME=/jdk64,FOO=bar"`.
-
 #### Spark Properties
 
 <table class="table">
@@ -110,7 +106,23 @@ Most of the configs are the same for Spark on YARN as for 
other deployment modes
   <td><code>spark.yarn.access.namenodes</code></td>
   <td>(none)</td>
   <td>
-    A list of secure HDFS namenodes your Spark application is going to access. 
For example, 
`spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. The 
Spark application must have acess to the namenodes listed and Kerberos must be 
properly configured to be able to access them (either in the same realm or in a 
trusted realm). Spark acquires security tokens for each of the namenodes so 
that the Spark application can access those remote HDFS clusters.
+    A list of secure HDFS namenodes your Spark application is going to access. 
For 
+    example, 
`spark.yarn.access.namenodes=hdfs://nn1.com:8032,hdfs://nn2.com:8032`. 
+    The Spark application must have acess to the namenodes listed and Kerberos 
must 
+    be properly configured to be able to access them (either in the same realm 
or in 
+    a trusted realm). Spark acquires security tokens for each of the namenodes 
so that 
+    the Spark application can access those remote HDFS clusters.
+  </td>
+</tr>
+<tr>
+  <td><code>spark.yarn.appMasterEnv.[EnvironmentVariableName]</code></td>
+  <td>(none)</td>
+  <td>
+     Add the environment variable specified by 
<code>EnvironmentVariableName</code> to the 
+     Application Master process launched on YARN. The user can specify 
multiple of 
+     these and to set multiple environment variables. In yarn-cluster mode 
this controls 
+     the environment of the SPARK driver and in yarn-client mode it only 
controls 
+     the environment of the executor launcher. 
   </td>
 </tr>
 </table>

http://git-wip-us.apache.org/repos/asf/spark/blob/7b798e10/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
----------------------------------------------------------------------
diff --git 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
index 44e025b..1da0a1b 100644
--- a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
+++ b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
@@ -259,6 +259,14 @@ trait ClientBase extends Logging {
     localResources
   }
 
+  /** Get all application master environment variables set on this SparkConf */
+  def getAppMasterEnv: Seq[(String, String)] = {
+    val prefix = "spark.yarn.appMasterEnv."
+    sparkConf.getAll.filter{case (k, v) => k.startsWith(prefix)}
+      .map{case (k, v) => (k.substring(prefix.length), v)}
+  }
+
+
   def setupLaunchEnv(
       localResources: HashMap[String, LocalResource],
       stagingDir: String): HashMap[String, String] = {
@@ -276,6 +284,11 @@ trait ClientBase extends Logging {
     distCacheMgr.setDistFilesEnv(env)
     distCacheMgr.setDistArchivesEnv(env)
 
+    getAppMasterEnv.foreach { case (key, value) =>
+      YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
+    }
+
+    // Keep this for backwards compatibility but users should move to the 
config
     sys.env.get("SPARK_YARN_USER_ENV").foreach { userEnvs =>
       // Allow users to specify some environment variables.
       YarnSparkHadoopUtil.setEnvFromInputString(env, userEnvs, 
File.pathSeparator)

http://git-wip-us.apache.org/repos/asf/spark/blob/7b798e10/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
----------------------------------------------------------------------
diff --git 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
index 4ba7133..71a9e42 100644
--- 
a/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
+++ 
b/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala
@@ -171,7 +171,11 @@ trait ExecutorRunnableUtil extends Logging {
     val extraCp = sparkConf.getOption("spark.executor.extraClassPath")
     ClientBase.populateClasspath(null, yarnConf, sparkConf, env, extraCp)
 
-    // Allow users to specify some environment variables
+    sparkConf.getExecutorEnv.foreach { case (key, value) =>
+      YarnSparkHadoopUtil.addToEnvironment(env, key, value, File.pathSeparator)
+    }
+
+    // Keep this for backwards compatibility but users should move to the 
config
     YarnSparkHadoopUtil.setEnvFromInputString(env, 
System.getenv("SPARK_YARN_USER_ENV"),
       File.pathSeparator)
 


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

git commit: SPARK-1680: use configs for specifying environment variables on YARN

Reply via email to