[
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025255#comment-14025255
]
Paul R. Brown commented on SPARK-2075:
--------------------------------------
The job is run by a Java client that connects to the master (using a
SparkContext).
Bundling is performed by a Maven build with two shade plugin invocations, one
to package a "driver" uberjar and one to packager a "worker" uberjar. The
worker flavor is sent to the worker nodes, the driver contains the code to
connect to the master and run the job. The Maven build runs against the JAR
from Maven Central, and the deployment uses the Spark 1.0.0 hadoop1 download.
(The Spark is staged to S3 once and then downloaded onto master/worker nodes
and set up during cluster provisioning.)
The Maven build uses the usual Scala setup with the library as a dependency and
the plugin:
{code}
<dependency>
<groupId>org.scala-lang</groupId>
<artifactId>scala-library</artifactId>
<version>2.10.3</version>
</dependency>
{code}
{code}
<plugin>
<groupId>net.alchim31.maven</groupId>
<artifactId>scala-maven-plugin</artifactId>
<executions>
<execution>
<goals>
<goal>compile</goal>
<goal>testCompile</goal>
</goals>
</execution>
</executions>
<configuration>
<scalaVersion>2.10.3</scalaVersion>
<jvmArgs>
<jvmArg>-Xms64m</jvmArg>
<jvmArg>-Xmx4096m</jvmArg>
</jvmArgs>
</configuration>
</plugin>
{code}
> Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven
> 1.0.0 artifact
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-2075
> URL: https://issues.apache.org/jira/browse/SPARK-2075
> Project: Spark
> Issue Type: Bug
> Components: Build, Spark Core
> Affects Versions: 1.0.0
> Reporter: Paul R. Brown
>
> Running a job built against the Maven dep for 1.0.0 and the hadoop1
> distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
> | grep 'rdd/RDD' | grep 'saveAs'
> 1519 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
> 1560 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there. It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
> 1519 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
> 1560 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)