Taeyun Kim created SPARK-1825:
---------------------------------

             Summary: Windows Spark fails to work with Linux YARN
                 Key: SPARK-1825
                 URL: https://issues.apache.org/jira/browse/SPARK-1825
             Project: Spark
          Issue Type: Bug
            Reporter: Taeyun Kim


Windows Spark fails to work with Linux YARN.
This is a cross-platform problem.

On YARN side, Hadoop 2.4.0 resolved the issue as follows:
https://issues.apache.org/jira/browse/YARN-1824

But Spark YARN module does not incorporate the new YARN API yet, so problem 
persists for Spark.

First, the following source files should be changed:
- /yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala
- 
/yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ExecutorRunnableUtil.scala

Change is as follows:
- Replace .$() to .$$()
- Replace File.pathSeparator for Environment.CLASSPATH.name to 
ApplicationConstants.CLASS_PATH_SEPARATOR (import 
org.apache.hadoop.yarn.api.ApplicationConstants is required for this)

Unless the above are applied, launch_container.sh will contain invalid shell 
script statements(since they will contain Windows-specific separators), and job 
will fail.
Also, the following symptom should also be fixed (I could not find the relevant 
source code):
- SPARK_HOME environment variable is copied straight to launch_container.sh. It 
should be changed to the path format for the server OS, or, the better, a 
separate environment variable or a configuration variable should be created.
- '%HADOOP_MAPRED_HOME%' string still exists in launch_container.sh, after the 
above change is applied. maybe I missed a few lines.

I'm not sure whether this is all, since I'm new to both Spark and YARN.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to