[
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14252979#comment-14252979
]
Sun Rui edited comment on SPARK-2075 at 12/19/14 5:33 AM:
----------------------------------------------------------
Since `mvn release` is built for Hadoop 1.0.4, I don't understand the reason
why there is difference in RDD.class bytecode from mvn spark-core and the
pre-built binary for Hadoop 1.x, because they are both built for Hadoop 1.x and
has the same version 1.1.0.
According to [~zsxwing]'s PR, it seems that it's diffcult to guanranttee same
bytecode for Hadoop1.x and 2.x. So maybe we need to pubish two versions of a
module to mvn, one is for Hadoop 1.x, the other is for Hadoop 2.x, for example,
spark-core_2.10-1.1.0-hadoop1.jar and spark-core_2.10-1.1.0-hadoop2.jar?
was (Author: sunrui):
Since `mvn release` is built for Hadoop 1.0.4, I don't understand the reason
why there is difference in RDD.class bytecode from mvn spark-core and the
pre-built binary for Hadoop 1.x, because they are both built for Hadoop 1.x and
has the save version 1.1.0.
According to [~zsxwing]'s PR, it seems that it's diffcult to guanranttee same
bytecode for Hadoop1.x and 2.x. So maybe we need to pubish two versions of a
module to mvn, one is for Hadoop 1.x, the other is for Hadoop 2.x, for example,
spark-core_2.10-1.1.0-hadoop1.jar and spark-core_2.10-1.1.0-hadoop2.jar?
> Anonymous classes are missing from Spark distribution
> -----------------------------------------------------
>
> Key: SPARK-2075
> URL: https://issues.apache.org/jira/browse/SPARK-2075
> Project: Spark
> Issue Type: Bug
> Components: Build, Spark Core
> Affects Versions: 1.0.0
> Reporter: Paul R. Brown
> Priority: Critical
>
> Running a job built against the Maven dep for 1.0.0 and the hadoop1
> distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
> | grep 'rdd/RDD' | grep 'saveAs'
> 1519 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
> 1560 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there. It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
> 1519 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
> 1560 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]