[
https://issues.apache.org/jira/browse/SPARK-2075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14021465#comment-14021465
]
Patrick Wendell edited comment on SPARK-2075 at 6/8/14 10:13 PM:
-----------------------------------------------------------------
Okay I did some more digging. I think the issue is that the anonymous classes
used by saveAsTextFile are not guaranteed to be compiled to the same name every
time you compile them in Scala. In the Hadoop 1 build these end up being
shortened wheras in the Hadoop 2 build they use the longer names.
saveAsTextFile seems to, strangely, be the only affected function. I confirmed
this by looking at the difference in the hadoop 1 and 2 jars:
{code}
$ jar tvf spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop*.jar |grep
"rdd\/RDD\\$" | awk '{ print $8;}' | sort > hadoop1
$ jar tvf spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop*.jar |grep
"rdd\/RDD\\$" | awk '{ print $8;}' | sort > hadoop2
$ diff hadoop1 hadoop2
23a24
> org/apache/spark/rdd/RDD$$anonfun$28$$anonfun$apply$13.class
27,29d27
< org/apache/spark/rdd/RDD$$anonfun$30$$anonfun$apply$13.class
< org/apache/spark/rdd/RDD$$anonfun$30.class
< org/apache/spark/rdd/RDD$$anonfun$31.class
90a89,90
> org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$1.class
> org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$2.class
{code}
This strangely only seems to affect the saveAsTextFile function.
I'm still a bit confused though because I didn't think these anonymous classes
would show up in the byte code of the user application, so I don't think it
should matter (i.e. this is why Scala probably allows this).
{code}
javap RDD | grep saveAsText
public void saveAsTextFile(java.lang.String);
public void saveAsTextFile(java.lang.String, java.lang.Class<? extends
org.apache.hadoop.io.compress.CompressionCodec>);
{code}
[~paulrbrown] could you explain how you are bundling and submitting your
application to the Spark cluster?
was (Author: pwendell):
Okay I did some more digging. I think the issue is that the anonymous classes
used by saveAsTextFile are not guaranteed to be compiled to the same name every
time you compile them in Scala. In the Hadoop 1 build these end up being
shortened wheras in the Hadoop 2 build they use the longer names.
saveAsTextFile seems to, strangely, be the only affected function. I confirmed
this by looking at the difference in the hadoop 1 and 2 jars:
{code}
$ jar tvf spark-1.0.0-bin-hadoop1/lib/spark-assembly-1.0.0-hadoop*.jar |grep
"rdd\/RDD\\$" | awk '{ print $8;}' | sort > hadoop1
$ jar tvf spark-1.0.0-bin-hadoop2/lib/spark-assembly-1.0.0-hadoop*.jar |grep
"rdd\/RDD\\$" | awk '{ print $8;}' | sort > hadoop2
$ diff hadoop1 hadoop2
23a24
> org/apache/spark/rdd/RDD$$anonfun$28$$anonfun$apply$13.class
27,29d27
< org/apache/spark/rdd/RDD$$anonfun$30$$anonfun$apply$13.class
< org/apache/spark/rdd/RDD$$anonfun$30.class
< org/apache/spark/rdd/RDD$$anonfun$31.class
90a89,90
> org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$1.class
> org/apache/spark/rdd/RDD$$anonfun$saveAsTextFile$2.class
{code}
This strangely only seems to affect the saveAsTextFile function.
I'm still a bit confused though because I didn't think these anonymous classes
would show up in the byte code of the user application, so I don't think it
should matter (i.e. this is why Scala probably allows this).
[~paulrbrown] could you explain how you are bundling and submitting your
application to the Spark cluster?
> Hadoop1 distribution of 1.0.0 does not contain classes expected by the Maven
> 1.0.0 artifact
> -------------------------------------------------------------------------------------------
>
> Key: SPARK-2075
> URL: https://issues.apache.org/jira/browse/SPARK-2075
> Project: Spark
> Issue Type: Bug
> Components: Build, Spark Core
> Affects Versions: 1.0.0
> Reporter: Paul R. Brown
>
> Running a job built against the Maven dep for 1.0.0 and the hadoop1
> distribution produces:
> {code}
> java.lang.ClassNotFoundException:
> org.apache.spark.rdd.RDD$$anonfun$saveAsTextFile$1
> {code}
> Here's what's in the Maven dep as of 1.0.0:
> {code}
> jar tvf
> ~/.m2/repository/org/apache/spark/spark-core_2.10/1.0.0/spark-core_2.10-1.0.0.jar
> | grep 'rdd/RDD' | grep 'saveAs'
> 1519 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
> 1560 Mon May 26 13:57:58 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
> And here's what's in the hadoop1 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop1.0.4.jar| grep 'rdd/RDD' | grep 'saveAs'
> {code}
> I.e., it's not there. It is in the hadoop2 distribution:
> {code}
> jar tvf spark-assembly-1.0.0-hadoop2.2.0.jar| grep 'rdd/RDD' | grep 'saveAs'
> 1519 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$1.class
> 1560 Mon May 26 07:29:54 PDT 2014
> org/apache/spark/rdd/RDD$anonfun$saveAsTextFile$2.class
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)