[21/50] git commit: Updated documentation about the YARN v2.2 build process

pwendell Wed, 11 Dec 2013 23:13:33 -0800

Updated documentation about the YARN v2.2 build process


Project: http://git-wip-us.apache.org/repos/asf/incubator-spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-spark/commit/f2fb4b42
Tree: http://git-wip-us.apache.org/repos/asf/incubator-spark/tree/f2fb4b42
Diff: http://git-wip-us.apache.org/repos/asf/incubator-spark/diff/f2fb4b42

Branch: refs/heads/scala-2.10
Commit: f2fb4b422863059476816df07ca7ea18f62e3a9d
Parents: 5d46025
Author: Ali Ghodsi <[email protected]>
Authored: Fri Dec 6 00:43:12 2013 -0800
Committer: Ali Ghodsi <[email protected]>
Committed: Fri Dec 6 16:31:26 2013 -0800

----------------------------------------------------------------------
 docs/building-with-maven.md | 4 ++++
 docs/index.md               | 2 +-
 docs/running-on-yarn.md     | 8 ++++++++
 3 files changed, 13 insertions(+), 1 deletion(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/f2fb4b42/docs/building-with-maven.md
----------------------------------------------------------------------
diff --git a/docs/building-with-maven.md b/docs/building-with-maven.md
index 19c01e1..a508786 100644
--- a/docs/building-with-maven.md
+++ b/docs/building-with-maven.md
@@ -45,6 +45,10 @@ For Apache Hadoop 2.x, 0.23.x, Cloudera CDH MRv2, and other 
Hadoop versions with
     # Cloudera CDH 4.2.0 with MapReduce v2
     $ mvn -Phadoop2-yarn -Dhadoop.version=2.0.0-cdh4.2.0 
-Dyarn.version=2.0.0-chd4.2.0 -DskipTests clean package
 
+Hadoop versions 2.2.x and newer can be built by setting the ```new-yarn``` and 
the ```yarn.version``` as follows:
+       mvn -Dyarn.version=2.2.0 -Dhadoop.version=2.2.0 -Pnew-yarn
+
+The build process handles Hadoop 2.2.x as a special case that uses the 
directory ```new-yarn```, which supports the new YARN API. Furthermore, for 
this version, the build depends on artifacts published by the spark-project to 
enable Akka 2.0.5 to work with protobuf 2.5. 
 
 ## Spark Tests in Maven ##
 

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/f2fb4b42/docs/index.md
----------------------------------------------------------------------
diff --git a/docs/index.md b/docs/index.md
index bd386a8..56e1142 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -56,7 +56,7 @@ Hadoop, you must build Spark against the same version that 
your cluster uses.
 By default, Spark links to Hadoop 1.0.4. You can change this by setting the
 `SPARK_HADOOP_VERSION` variable when compiling:
 
-    SPARK_HADOOP_VERSION=1.2.1 sbt/sbt assembly
+    SPARK_HADOOP_VERSION=2.2.0 sbt/sbt assembly
 
 In addition, if you wish to run Spark on [YARN](running-on-yarn.md), set
 `SPARK_YARN` to `true`:

http://git-wip-us.apache.org/repos/asf/incubator-spark/blob/f2fb4b42/docs/running-on-yarn.md
----------------------------------------------------------------------
diff --git a/docs/running-on-yarn.md b/docs/running-on-yarn.md
index 68fd6c2..3ec656c 100644
--- a/docs/running-on-yarn.md
+++ b/docs/running-on-yarn.md
@@ -17,6 +17,7 @@ This can be built by setting the Hadoop version and 
`SPARK_YARN` environment var
 The assembled JAR will be something like this:
 
`./assembly/target/scala-{{site.SCALA_VERSION}}/spark-assembly_{{site.SPARK_VERSION}}-hadoop2.0.5.jar`.
 
+The build process now also supports new YARN versions (2.2.x). See below.
 
 # Preparations
 
@@ -111,9 +112,16 @@ For example:
     
SPARK_YARN_APP_JAR=examples/target/scala-{{site.SCALA_VERSION}}/spark-examples-assembly-{{site.SPARK_VERSION}}.jar
 \
     MASTER=yarn-client ./spark-shell
 
+# Building Spark for Hadoop/YARN 2.2.x
+
+Hadoop 2.2.x users must build Spark and publish it locally. The SBT build 
process handles Hadoop 2.2.x as a special case. This version of Hadoop has new 
YARN API changes and depends on a Protobuf version (2.5) that is not compatible 
with the Akka version (2.0.5) that Spark uses. Therefore, if the Hadoop version 
(e.g. set through ```SPARK_HADOOP_VERSION```) starts with 2.2.0 or higher then 
the build process will depend on Akka artifacts distributed by the Spark 
project compatible with Protobuf 2.5. Furthermore, the build process then uses 
the directory ```new-yarn``` (stead of ```yarn```), which supports the new YARN 
API. The build process should seamlessly work out of the box. 
+
+See [Building Spark with Maven](building-with-maven.md) for instructions on 
how to build Spark using the Maven process.
+
 # Important Notes
 
 - We do not requesting container resources based on the number of cores. Thus 
the numbers of cores given via command line arguments cannot be guaranteed.
 - The local directories used for spark will be the local directories 
configured for YARN (Hadoop Yarn config yarn.nodemanager.local-dirs). If the 
user specifies spark.local.dir, it will be ignored.
 - The --files and --archives options support specifying file names with the # 
similar to Hadoop. For example you can specify: --files 
localtest.txt#appSees.txt and this will upload the file you have locally named 
localtest.txt into HDFS but this will be linked to by the name appSees.txt and 
your application should use the name as appSees.txt to reference it when 
running on YARN.
 - The --addJars option allows the SparkContext.addJar function to work if you 
are using it with local files. It does not need to be used if you are using it 
with HDFS, HTTP, HTTPS, or FTP files.
+- YARN 2.2.x users cannot simply depend on the Spark packages without building 
Spark, as the published Spark artifacts are compiled to work with the pre 2.2 
API. Those users must build Spark and publish it locally.  
\ No newline at end of file

[21/50] git commit: Updated documentation about the YARN v2.2 build process

Reply via email to