Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector
I have given this a try in a spark-shell and I still get many Allocation Failures On Thursday, July 3, 2014 9:51 AM, Xiangrui Meng men...@gmail.com wrote: The SparkKMeans is just an example code showing a barebone implementation of k-means. To run k-means on big datasets, please use the KMeans implemented in MLlib directly: http://spark.apache.org/docs/latest/mllib-clustering.html -Xiangrui On Wed, Jul 2, 2014 at 9:50 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I can run it now with the suggested method. However, I have encountered a new problem that I have not faced before (sent another email with that one but here it goes again ...) I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with spark-0.8.0 with this line in bash.rc export _JAVA_OPTIONS=-Xmx15g -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails . It finished in a decent time, ~50 seconds, and I had only a few Full GC messages from Java. (a max of 4-5) Now, using the same export in bash.rc but with spark-1.0.0 (and running it with spark-submit) the first loop never finishes and I get a lot of: 18.537: [GC (Allocation Failure) --[PSYoungGen: 11796992K-11796992K(13762560K)] 11797442K-11797450K(13763072K), 2.8420311 secs] [Times: user=5.81 sys=2.12, real=2.85 secs] or 31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K-3177967K(13762560K)] [ParOldGen: 505K-505K(512K)] 11797497K-3178473K(13763072K), [Metaspace: 37646K-37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, real=2.31 secs] I tried passing different parameters for the JVM through spark-submit, but the results are the same This happens with java 1.7 and also with java 1.8. I do not know what the Ergonomics stands for ... How can I get a decent performance from spark-1.0.0 considering that spark-0.8.0 did not need any fine tuning on the gargage collection method (the default worked well) ? Thank you On Wednesday, July 2, 2014 4:45 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: The scripts that Xiangrui mentions set up the classpath...Can you run ./run-example for the provided example sucessfully? What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call run-example -- that will show you the exact java command used to run the example at the start of execution. Assuming you can run examples succesfully, you should be able to just copy that and add your jar to the front of the classpath. If that works you can start removing extra jars (run-examples put all the example jars in the cp, which you won't need) As you said the error you see is indicative of the class not being available/seen at runtime but it's hard to tell why. On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do. I have also packed my code under a jar file with sbt. It completes successfully but when I try to run it : java -jar myjar.jar I get the same error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does it succeeds in compiling and does not give the same error ? The error itself NoClassDefFoundError means that the files are available at compile time, but for some reason I cannot figure out they are not available at run time. Does anyone know why ? Thank you On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng men...@gmail.com wrote: You can use either bin/run-example or bin/spark-summit to run example code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here -Xiangrui On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Hello, I have installed spark-1.0.0 with scala2.10.3. I have built spark with sbt/sbt assembly and added /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar to my CLASSPATH variable. Then I went here ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a new directory classes and compiled SparkKMeans.scala with scalac -d classes/ SparkKMeans.scala Then I navigated to classes (I commented this line in the scala file : package org.apache.spark.examples ) and tried to run it with java -cp . SparkKMeans and I get the following error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at
Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector
I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do. I have also packed my code under a jar file with sbt. It completes successfully but when I try to run it : java -jar myjar.jar I get the same error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does it succeeds in compiling and does not give the same error ? The error itself NoClassDefFoundError means that the files are available at compile time, but for some reason I cannot figure out they are not available at run time. Does anyone know why ? Thank you On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng men...@gmail.com wrote: You can use either bin/run-example or bin/spark-summit to run example code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here -Xiangrui On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Hello, I have installed spark-1.0.0 with scala2.10.3. I have built spark with sbt/sbt assembly and added /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar to my CLASSPATH variable. Then I went here ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a new directory classes and compiled SparkKMeans.scala with scalac -d classes/ SparkKMeans.scala Then I navigated to classes (I commented this line in the scala file : package org.apache.spark.examples ) and tried to run it with java -cp . SparkKMeans and I get the following error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 6 more The jar under /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar contains the breeze/linalg/Vector* path, I even tried to unpack it and put it in CLASSPATH to it does not seem to pick it up I am currently running java 1.8 java version 1.8.0_05 Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) What I am doing wrong ?
Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector
Got it ! Ran the jar with spark-submit. Thanks ! On Wednesday, July 2, 2014 9:16 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do. I have also packed my code under a jar file with sbt. It completes successfully but when I try to run it : java -jar myjar.jar I get the same error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does it succeeds in compiling and does not give the same error ? The error itself NoClassDefFoundError means that the files are available at compile time, but for some reason I cannot figure out they are not available at run time. Does anyone know why ? Thank you On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng men...@gmail.com wrote: You can use either bin/run-example or bin/spark-summit to run example code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here -Xiangrui On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Hello, I have installed spark-1.0.0 with scala2.10.3. I have built spark with sbt/sbt assembly and added /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar to my CLASSPATH variable. Then I went here ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a new directory classes and compiled SparkKMeans.scala with scalac -d classes/ SparkKMeans.scala Then I navigated to classes (I commented this line in the scala file : package org.apache.spark.examples ) and tried to run it with java -cp . SparkKMeans and I get the following error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 6 more The jar under /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar contains the breeze/linalg/Vector* path, I even tried to unpack it and put it in CLASSPATH to it does not seem to pick it up I am currently running java 1.8 java version 1.8.0_05 Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) What I am doing wrong ?
Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector
The scripts that Xiangrui mentions set up the classpath...Can you run ./run-example for the provided example sucessfully? What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call run-example -- that will show you the exact java command used to run the example at the start of execution. Assuming you can run examples succesfully, you should be able to just copy that and add your jar to the front of the classpath. If that works you can start removing extra jars (run-examples put all the example jars in the cp, which you won't need) As you said the error you see is indicative of the class not being available/seen at runtime but it's hard to tell why. On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do. I have also packed my code under a jar file with sbt. It completes successfully but when I try to run it : java -jar myjar.jar I get the same error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does it succeeds in compiling and does not give the same error ? The error itself NoClassDefFoundError means that the files are available at compile time, but for some reason I cannot figure out they are not available at run time. Does anyone know why ? Thank you On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng men...@gmail.com wrote: You can use either bin/run-example or bin/spark-summit to run example code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here -Xiangrui On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Hello, I have installed spark-1.0.0 with scala2.10.3. I have built spark with sbt/sbt assembly and added /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar to my CLASSPATH variable. Then I went here ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a new directory classes and compiled SparkKMeans.scala with scalac -d classes/ SparkKMeans.scala Then I navigated to classes (I commented this line in the scala file : package org.apache.spark.examples ) and tried to run it with java -cp . SparkKMeans and I get the following error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 6 more The jar under /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar contains the breeze/linalg/Vector* path, I even tried to unpack it and put it in CLASSPATH to it does not seem to pick it up I am currently running java 1.8 java version 1.8.0_05 Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) What I am doing wrong ?
Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector
I can run it now with the suggested method. However, I have encountered a new problem that I have not faced before (sent another email with that one but here it goes again ...) I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with spark-0.8.0 with this line in bash.rc export _JAVA_OPTIONS=-Xmx15g -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails . It finished in a decent time, ~50 seconds, and I had only a few Full GC messages from Java. (a max of 4-5) Now, using the same export in bash.rc but with spark-1.0.0 (and running it with spark-submit) the first loop never finishes and I get a lot of: 18.537: [GC (Allocation Failure) --[PSYoungGen: 11796992K-11796992K(13762560K)] 11797442K-11797450K(13763072K), 2.8420311 secs] [Times: user=5.81 sys=2.12, real=2.85 secs] or 31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K-3177967K(13762560K)] [ParOldGen: 505K-505K(512K)] 11797497K-3178473K(13763072K), [Metaspace: 37646K-37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, real=2.31 secs] I tried passing different parameters for the JVM through spark-submit, but the results are the same This happens with java 1.7 and also with java 1.8. I do not know what the Ergonomics stands for ... How can I get a decent performance from spark-1.0.0 considering that spark-0.8.0 did not need any fine tuning on the gargage collection method (the default worked well) ? Thank you On Wednesday, July 2, 2014 4:45 PM, Yana Kadiyska yana.kadiy...@gmail.com wrote: The scripts that Xiangrui mentions set up the classpath...Can you run ./run-example for the provided example sucessfully? What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call run-example -- that will show you the exact java command used to run the example at the start of execution. Assuming you can run examples succesfully, you should be able to just copy that and add your jar to the front of the classpath. If that works you can start removing extra jars (run-examples put all the example jars in the cp, which you won't need) As you said the error you see is indicative of the class not being available/seen at runtime but it's hard to tell why. On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do. I have also packed my code under a jar file with sbt. It completes successfully but when I try to run it : java -jar myjar.jar I get the same error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does it succeeds in compiling and does not give the same error ? The error itself NoClassDefFoundError means that the files are available at compile time, but for some reason I cannot figure out they are not available at run time. Does anyone know why ? Thank you On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng men...@gmail.com wrote: You can use either bin/run-example or bin/spark-summit to run example code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here -Xiangrui On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Hello, I have installed spark-1.0.0 with scala2.10.3. I have built spark with sbt/sbt assembly and added /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar to my CLASSPATH variable. Then I went here ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a new directory classes and compiled SparkKMeans.scala with scalac -d classes/ SparkKMeans.scala Then I navigated to classes (I commented this line in the scala file : package org.apache.spark.examples ) and tried to run it with java -cp . SparkKMeans and I get the following error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector at
Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector
You can use either bin/run-example or bin/spark-summit to run example code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here -Xiangrui On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Hello, I have installed spark-1.0.0 with scala2.10.3. I have built spark with sbt/sbt assembly and added /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar to my CLASSPATH variable. Then I went here ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a new directory classes and compiled SparkKMeans.scala with scalac -d classes/ SparkKMeans.scala Then I navigated to classes (I commented this line in the scala file : package org.apache.spark.examples ) and tried to run it with java -cp . SparkKMeans and I get the following error: Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector at java.lang.Class.getDeclaredMethods0(Native Method) at java.lang.Class.privateGetDeclaredMethods(Class.java:2531) at java.lang.Class.getMethod0(Class.java:2774) at java.lang.Class.getMethod(Class.java:1663) at sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494) at sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486) Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) ... 6 more The jar under /home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar contains the breeze/linalg/Vector* path, I even tried to unpack it and put it in CLASSPATH to it does not seem to pick it up I am currently running java 1.8 java version 1.8.0_05 Java(TM) SE Runtime Environment (build 1.8.0_05-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode) What I am doing wrong ?