[ https://issues.apache.org/jira/browse/SPARK-1520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972819#comment-13972819 ]
Sean Owen commented on SPARK-1520: ---------------------------------- Java 6 had a limit of 65536 files per jar in total, but the limit is much higher in Java 7: http://stackoverflow.com/questions/9616250/what-is-the-maximum-number-of-files-per-jar https://blogs.oracle.com/xuemingshen/entry/zip64_support_for_4g_zipfile When I build the assembly I find that it has 70948 files. I think you are certainly onto something. I see the same behavior as you, and am using the latest Java 6/7. I also note that "unzip -l" succeeds for the Java 6 version, but fails with the following on the Java 7 version: {code} error: expected central file header signature not found (file #70949). (please check that you have transferred or created the zipfile in the appropriate BINARY mode and that you have compiled UnZip properly) {code} This might not be Java's fault. It could be something to do with how SBT handles merging the zip files, and not handling Java 7's output (which is zip64) correctly. As a short-term solution, I note that we can probably slim down the assembly jar. For example, fastutil is still in there for some reason, and accounts for 10,666 files. It shouldn't be there. You can get a quick view into where the files are with: {code} jar tf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar | grep -oE "(.+/)+" | uniq -c | sort -rn | head -100 {code} {code} 2883 breeze/linalg/operators/ 2034 it/unimi/dsi/fastutil/objects/ 1396 spire/std/ 1379 scala/tools/nsc/typechecker/ 1351 breeze/linalg/ 1215 it/unimi/dsi/fastutil/longs/ 1214 it/unimi/dsi/fastutil/ints/ 1213 it/unimi/dsi/fastutil/doubles/ 1211 it/unimi/dsi/fastutil/floats/ 1210 it/unimi/dsi/fastutil/shorts/ 1209 it/unimi/dsi/fastutil/chars/ 1209 it/unimi/dsi/fastutil/bytes/ 1187 scala/reflect/internal/ 896 com/google/common/collect/ 894 tachyon/thrift/ 886 spire/algebra/ 797 scala/tools/nsc/transform/ 749 scala/tools/nsc/interpreter/ 723 org/netlib/lapack/ 677 spire/math/ ... {code} > Inclusion of breeze corrupts assembly when compiled with JDK7 and run on JDK6 > ----------------------------------------------------------------------------- > > Key: SPARK-1520 > URL: https://issues.apache.org/jira/browse/SPARK-1520 > Project: Spark > Issue Type: Bug > Components: MLlib, Spark Core > Reporter: Patrick Wendell > Priority: Blocker > Fix For: 1.0.0 > > > This is a real doozie - when compiling a Spark assembly with JDK7, the > produced jar does not work well with JRE6. I confirmed the byte code being > produced is JDK 6 compatible (major version 50). What happens is that, > silently, the JRE will not load any class files from the assembled jar. > {code} > $> sbt/sbt assembly/assembly > $> /usr/lib/jvm/java-1.7.0-openjdk-amd64/bin/java -cp > /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar > org.apache.spark.ui.UIWorkloadGenerator > usage: ./bin/spark-class org.apache.spark.ui.UIWorkloadGenerator [master] > [FIFO|FAIR] > $> /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp > /home/patrick/Documents/spark/assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar > org.apache.spark.ui.UIWorkloadGenerator > Exception in thread "main" java.lang.NoClassDefFoundError: > org/apache/spark/ui/UIWorkloadGenerator > Caused by: java.lang.ClassNotFoundException: > org.apache.spark.ui.UIWorkloadGenerator > at java.net.URLClassLoader$1.run(URLClassLoader.java:217) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:205) > at java.lang.ClassLoader.loadClass(ClassLoader.java:323) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) > at java.lang.ClassLoader.loadClass(ClassLoader.java:268) > Could not find the main class: org.apache.spark.ui.UIWorkloadGenerator. > Program will exit. > {code} > I also noticed that if the jar is unzipped, and the classpath set to the > currently directory, it "just works". Finally, if the assembly jar is > compiled with JDK6, it also works. The error is seen with any class, not just > the UIWorkloadGenerator. Also, this error doesn't exist in branch 0.9, only > in master. > *Isolation* > -I ran a git bisection and this appeared after the MLLib sparse vector patch > was merged:- > https://github.com/apache/spark/commit/80c29689ae3b589254a571da3ddb5f9c866ae534 > SPARK-1212 > -I narrowed this down specifically to the inclusion of the breeze library. > Just adding breeze to an older (unaffected) build triggered the issue.- > I've found that if I just unpack and re-pack the jar (using `jar` from java 6 > or 7) it always works: > {code} > $ cd assembly/target/scala-2.10/ > $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp > ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar > org.apache.spark.ui.UIWorkloadGenerator # fails > $ jar xvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar > $ jar cvf spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar * > $ /usr/lib/jvm/java-1.6.0-openjdk-amd64/bin/java -cp > ./spark-assembly-1.0.0-SNAPSHOT-hadoop1.0.4.jar > org.apache.spark.ui.UIWorkloadGenerator # succeeds > {code} > I also noticed something of note. The Breeze package contains single > directories that have huge numbers of files in them (e.g. 2000+ class files > in one directory). It's possible we are hitting some weird bugs/corner cases > with compatibility of the internal storage format of the jar itself. -- This message was sent by Atlassian JIRA (v6.2#6252)