Repository: spark
Updated Branches:
  refs/heads/master c8bee932c -> 1272b2034


[SPARK-22151] PYTHONPATH not picked up from the spark.yarn.appMaste…

…rEnv properly

Running in yarn cluster mode and trying to set pythonpath via 
spark.yarn.appMasterEnv.PYTHONPATH doesn't work.

the yarn Client code looks at the env variables:
val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
But when you set spark.yarn.appMasterEnv it puts it into the local env.

So the python path set in spark.yarn.appMasterEnv isn't properly set.

You can work around if you are running in cluster mode by setting it on the 
client like:

PYTHONPATH=./addon/python/ spark-submit

## What changes were proposed in this pull request?
In Client.scala, PYTHONPATH was being overridden, so changed code to append 
values to PYTHONPATH instead of overriding them.

## How was this patch tested?
Added log statements to ApplicationMaster.scala to check for environment 
variable PYTHONPATH, ran a spark job in cluster mode before the change and 
verified the issue. Performed the same test after the change and verified the 
fix.

Author: pgandhi <pgan...@oath.com>

Closes #21468 from pgandhi999/SPARK-22151.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1272b203
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1272b203
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/1272b203

Branch: refs/heads/master
Commit: 1272b2034d4eed4bfe60a49e1065871b3a3f96e0
Parents: c8bee93
Author: pgandhi <pgan...@oath.com>
Authored: Wed Jul 18 14:07:03 2018 -0500
Committer: Thomas Graves <tgra...@apache.org>
Committed: Wed Jul 18 14:07:03 2018 -0500

----------------------------------------------------------------------
 .../src/main/scala/org/apache/spark/deploy/yarn/Client.scala | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/1272b203/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
----------------------------------------------------------------------
diff --git 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
index 793d012..ed9879c 100644
--- 
a/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
+++ 
b/resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala
@@ -811,10 +811,12 @@ private[spark] class Client(
 
     // Finally, update the Spark config to propagate PYTHONPATH to the AM and 
executors.
     if (pythonPath.nonEmpty) {
-      val pythonPathStr = (sys.env.get("PYTHONPATH") ++ pythonPath)
+      val pythonPathList = (sys.env.get("PYTHONPATH") ++ pythonPath)
+      env("PYTHONPATH") = (env.get("PYTHONPATH") ++ pythonPathList)
         .mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
-      env("PYTHONPATH") = pythonPathStr
-      sparkConf.setExecutorEnv("PYTHONPATH", pythonPathStr)
+      val pythonPathExecutorEnv = 
(sparkConf.getExecutorEnv.toMap.get("PYTHONPATH") ++
+        pythonPathList).mkString(ApplicationConstants.CLASS_PATH_SEPARATOR)
+      sparkConf.setExecutorEnv("PYTHONPATH", pythonPathExecutorEnv)
     }
 
     if (isClusterMode) {


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org
For additional commands, e-mail: commits-h...@spark.apache.org

Reply via email to