Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-03 Thread Wanda Hawk
I have given this a try in a spark-shell and I still get many Allocation 

On Thursday, July 3, 2014 9:51 AM, Xiangrui Meng wrote:

The SparkKMeans is just an example code showing a barebone
implementation of k-means. To run k-means on big datasets, please use
the KMeans implemented in MLlib directly:


On Wed, Jul 2, 2014 at 9:50 AM, Wanda Hawk wrote:
 I can run it now with the suggested method. However, I have encountered a
 new problem that I have not faced before (sent another email with that one
 but here it goes again ...)

 I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with
 spark-0.8.0 with this line in bash.rc  export _JAVA_OPTIONS=-Xmx15g
 -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails . It
 finished in a decent time, ~50 seconds, and I had only a few Full GC
 messages from Java. (a max of 4-5)

 Now, using the same export in bash.rc but with spark-1.0.0  (and running it
 with spark-submit) the first loop never finishes and  I get a lot of:
 18.537: [GC (Allocation Failure) --[PSYoungGen:
 11796992K-11796992K(13762560K)] 11797442K-11797450K(13763072K), 2.8420311
 secs] [Times: user=5.81 sys=2.12, real=2.85 secs]

  31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K-3177967K(13762560K)]
 [ParOldGen: 505K-505K(512K)] 11797497K-3178473K(13763072K), [Metaspace:
 37646K-37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11,
 real=2.31 secs]

 I tried passing different parameters for the JVM through spark-submit, but
 the results are the same
 This happens with java 1.7 and also with java 1.8.
 I do not know what the Ergonomics stands for ...

 How can I get a decent performance from spark-1.0.0 considering that
 spark-0.8.0 did not need any fine tuning on the gargage collection method
 (the default worked well) ?

 Thank you

 On Wednesday, July 2, 2014 4:45 PM, Yana Kadiyska

 The scripts that Xiangrui mentions set up the classpath...Can you run
 ./run-example for the provided example sucessfully?

 What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call
 run-example -- that will show you the exact java command used to run
 the example at the start of execution. Assuming you can run examples
 succesfully, you should be able to just copy that and add your jar to
 the front of the classpath. If that works you can start removing extra
 jars (run-examples put all the example jars in the cp, which you won't

 As you said the error you see is indicative of the class not being
 available/seen at runtime but it's hard to tell why.

 On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wrote:
 I want to make some minor modifications in the SparkMeans.scala so running
 the basic example won't do.
 I have also packed my code under a jar file with sbt. It completes
 successfully but when I try to run it : java -jar myjar.jar I get the
 Exception in thread main java.lang.NoClassDefFoundError:
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(
        at java.lang.Class.getMethod0(
        at java.lang.Class.getMethod(

 If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does
 it succeeds in compiling and does not give the same error ?
 The error itself NoClassDefFoundError means that the files are available
 at compile time, but for some reason I cannot figure out they are not
 available at run time. Does anyone know why ?

 Thank you

 On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng wrote:

 You can use either bin/run-example or bin/spark-summit to run example
 code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
 classpath. There are examples in the official doc:

 On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wrote:

 I have installed spark-1.0.0 with scala2.10.3. I have built spark with
 sbt/sbt assembly and added

 to my CLASSPATH variable.
 Then I went here
 new directory classes and compiled SparkKMeans.scala with scalac -d
 classes/ SparkKMeans.scala
 Then I navigated to classes (I commented this line in the scala file :
 package org.apache.spark.examples ) and tried to run it with java -cp .
 SparkKMeans and I get the following error:
 Exception in thread main java.lang.NoClassDefFoundError:

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Wanda Hawk
I want to make some minor modifications in the SparkMeans.scala so running the 
basic example won't do. 
I have also packed my code under a jar file with sbt. It completes 
successfully but when I try to run it : java -jar myjar.jar I get the same 
Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(
        at java.lang.Class.getMethod0(
        at java.lang.Class.getMethod(
        at sun.launcher.LauncherHelper.getMainMethod(
        at sun.launcher.LauncherHelper.checkAndLoadMain(

If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does it 
succeeds in compiling and does not give the same error ? 
The error itself NoClassDefFoundError means that the files are available at 
compile time, but for some reason I cannot figure out they are not available at 
run time. Does anyone know why ?

Thank you

On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng wrote:

You can use either bin/run-example or bin/spark-summit to run example
code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
classpath. There are examples in the official doc:

On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wrote:

 I have installed spark-1.0.0 with scala2.10.3. I have built spark with
 sbt/sbt assembly and added
 to my CLASSPATH variable.
 Then I went here
 ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a
 new directory classes and compiled SparkKMeans.scala with scalac -d
 classes/ SparkKMeans.scala
 Then I navigated to classes (I commented this line in the scala file :
 package org.apache.spark.examples ) and tried to run it with java -cp .
 SparkKMeans and I get the following error:
 Exception in thread main java.lang.NoClassDefFoundError:
         at java.lang.Class.getDeclaredMethods0(Native Method)
         at java.lang.Class.privateGetDeclaredMethods(
         at java.lang.Class.getMethod0(
         at java.lang.Class.getMethod(
 Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector
         at Method)
         at java.lang.ClassLoader.loadClass(
         at sun.misc.Launcher$AppClassLoader.loadClass(
         at java.lang.ClassLoader.loadClass(
         ... 6 more
 The jar under
 contains the breeze/linalg/Vector* path, I even tried to unpack it and put
 it in CLASSPATH to it does not seem to pick it up

 I am currently running java 1.8
 java version 1.8.0_05
 Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

 What I am doing wrong ?

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Wanda Hawk
Got it ! Ran the jar with spark-submit. Thanks !

On Wednesday, July 2, 2014 9:16 AM, Wanda Hawk wrote:

I want to make some minor modifications in the SparkMeans.scala so running the 
basic example won't do. 
I have also packed my code under a jar file with sbt. It completes 
successfully but when I try to run it : java -jar myjar.jar I get the same 
Exception in thread main java.lang.NoClassDefFoundError: breeze/linalg/Vector
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(
        at java.lang.Class.getMethod0(
        at java.lang.Class.getMethod(
        at sun.launcher.LauncherHelper.getMainMethod(
        at sun.launcher.LauncherHelper.checkAndLoadMain(

If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does it 
succeeds in compiling and does not give the same error ? 
The error itself NoClassDefFoundError means that the files are available at 
compile time, but for some reason I cannot figure out they are not available at 
run time. Does anyone know why ?

Thank you

On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng wrote:

You can use either bin/run-example or bin/spark-summit to run example
code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
classpath. There
 are examples in the official doc:

On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wrote:

 I have installed spark-1.0.0 with scala2.10.3. I have built spark with
 sbt/sbt assembly and added

 to my CLASSPATH variable.
 Then I went here
 ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a
 new directory classes and compiled SparkKMeans.scala with scalac -d
 classes/ SparkKMeans.scala
 Then I navigated to classes (I commented this line in the scala file :
 package org.apache.spark.examples ) and tried to run it with java -cp .
 SparkKMeans and I get the following error:
 Exception in thread main java.lang.NoClassDefFoundError:

         at java.lang.Class.getDeclaredMethods0(Native Method)
         at java.lang.Class.privateGetDeclaredMethods(
         at java.lang.Class.getMethod0(
         at java.lang.Class.getMethod(
 Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector
         at Method)
         at java.lang.ClassLoader.loadClass(
         at sun.misc.Launcher$AppClassLoader.loadClass(
         at java.lang.ClassLoader.loadClass(
         ... 6 more
 The jar under
 contains the breeze/linalg/Vector* path, I even tried to unpack it and put
 it in CLASSPATH to it does not seem to pick it up

 I am currently running java 1.8
 java version 1.8.0_05
 Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

 What I am doing wrong ?

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Yana Kadiyska
The scripts that Xiangrui mentions set up the classpath...Can you run
./run-example for the provided example sucessfully?

What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call
run-example -- that will show you the exact java command used to run
the example at the start of execution. Assuming you can run examples
succesfully, you should be able to just copy that and add your jar to
the front of the classpath. If that works you can start removing extra
jars (run-examples put all the example jars in the cp, which you won't

 As you said the error you see is indicative of the class not being
available/seen at runtime but it's hard to tell why.

On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wrote:
 I want to make some minor modifications in the SparkMeans.scala so running
 the basic example won't do.
 I have also packed my code under a jar file with sbt. It completes
 successfully but when I try to run it : java -jar myjar.jar I get the same
 Exception in thread main java.lang.NoClassDefFoundError:
 at java.lang.Class.getDeclaredMethods0(Native Method)
 at java.lang.Class.privateGetDeclaredMethods(
 at java.lang.Class.getMethod0(
 at java.lang.Class.getMethod(

 If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does
 it succeeds in compiling and does not give the same error ?
 The error itself NoClassDefFoundError means that the files are available
 at compile time, but for some reason I cannot figure out they are not
 available at run time. Does anyone know why ?

 Thank you

 On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng wrote:

 You can use either bin/run-example or bin/spark-summit to run example
 code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
 classpath. There are examples in the official doc:

 On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wrote:

 I have installed spark-1.0.0 with scala2.10.3. I have built spark with
 sbt/sbt assembly and added

 to my CLASSPATH variable.
 Then I went here
 ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created
 new directory classes and compiled SparkKMeans.scala with scalac -d
 classes/ SparkKMeans.scala
 Then I navigated to classes (I commented this line in the scala file :
 package org.apache.spark.examples ) and tried to run it with java -cp .
 SparkKMeans and I get the following error:
 Exception in thread main java.lang.NoClassDefFoundError:
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(
at java.lang.Class.getMethod0(
at java.lang.Class.getMethod(
 Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector
at Method)
at java.lang.ClassLoader.loadClass(
at sun.misc.Launcher$AppClassLoader.loadClass(
at java.lang.ClassLoader.loadClass(
... 6 more
 The jar under

 contains the breeze/linalg/Vector* path, I even tried to unpack it and put
 it in CLASSPATH to it does not seem to pick it up

 I am currently running java 1.8
 java version 1.8.0_05
 Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

 What I am doing wrong ?

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Wanda Hawk
I can run it now with the suggested method. However, I have encountered a new 
problem that I have not faced before (sent another email with that one but here 
it goes again ...)

I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with 
spark-0.8.0 with this line in bash.rc  export _JAVA_OPTIONS=-Xmx15g -Xms15g 
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails . It finished in a 
decent time, ~50 seconds, and I had only a few Full GC messages from 
Java. (a max of 4-5)

Now, using the same export in bash.rc but with spark-1.0.0  (and running it 
with spark-submit) the first loop never finishes and  I get a lot of:
18.537: [GC (Allocation Failure) --[PSYoungGen: 
11796992K-11796992K(13762560K)] 11797442K-11797450K(13763072K), 2.8420311 
secs] [Times: user=5.81 sys=2.12, real=2.85 secs]


 31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K-3177967K(13762560K)] 
[ParOldGen: 505K-505K(512K)] 11797497K-3178473K(13763072K), [Metaspace: 
37646K-37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11, 
real=2.31 secs]
I tried passing different parameters for the JVM through spark-submit, but the 
results are the same
This happens with java 1.7 and also with java 1.8.
I do not know what the Ergonomics stands for ...

How can I get a decent performance from spark-1.0.0 considering that 
spark-0.8.0 did not need any fine tuning on the gargage collection method (the 
default worked well) ?

Thank you

On Wednesday, July 2, 2014 4:45 PM, Yana Kadiyska 

The scripts that Xiangrui mentions set up the classpath...Can you run
./run-example for the provided example sucessfully?

What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call
run-example -- that will show you the exact java command used to run
the example at the start of execution. Assuming you can run examples
succesfully, you should be able to just copy that and add your jar to
the front of the classpath. If that works you can start removing extra
jars (run-examples put all the example jars in the cp, which you won't

As you said the error you see is indicative of the class not being
available/seen at runtime but it's hard to tell why.

On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wrote:
 I want to make some minor modifications in the SparkMeans.scala so running
 the basic example won't do.
 I have also packed my code under a jar file with sbt. It completes
 successfully but when I try to run it : java -jar myjar.jar I get the same
 Exception in thread main java.lang.NoClassDefFoundError:
         at java.lang.Class.getDeclaredMethods0(Native Method)
         at java.lang.Class.privateGetDeclaredMethods(
         at java.lang.Class.getMethod0(
         at java.lang.Class.getMethod(

 If scalac -d classes/ SparkKMeans.scala can't see my classpath, why does
 it succeeds in compiling and does not give the same error ?
 The error itself NoClassDefFoundError means that the files are available
 at compile time, but for some reason I cannot figure out they are not
 available at run time. Does anyone know why ?

 Thank you

 On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng wrote:

 You can use either bin/run-example or bin/spark-summit to run example
 code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
 classpath. There are examples in the official doc:

 On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wrote:

 I have installed spark-1.0.0 with scala2.10.3. I have built spark with
 sbt/sbt assembly and added

 to my CLASSPATH variable.
 Then I went here
 ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created
 new directory classes and compiled SparkKMeans.scala with scalac -d
 classes/ SparkKMeans.scala
 Then I navigated to classes (I commented this line in the scala file :
 package org.apache.spark.examples ) and tried to run it with java -cp .
 SparkKMeans and I get the following error:
 Exception in thread main java.lang.NoClassDefFoundError:
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(
        at java.lang.Class.getMethod0(
        at java.lang.Class.getMethod(
 Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-01 Thread Xiangrui Meng
You can use either bin/run-example or bin/spark-summit to run example
code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
classpath. There are examples in the official doc:

On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wrote:

 I have installed spark-1.0.0 with scala2.10.3. I have built spark with
 sbt/sbt assembly and added
 to my CLASSPATH variable.
 Then I went here
 ../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples created a
 new directory classes and compiled SparkKMeans.scala with scalac -d
 classes/ SparkKMeans.scala
 Then I navigated to classes (I commented this line in the scala file :
 package org.apache.spark.examples ) and tried to run it with java -cp .
 SparkKMeans and I get the following error:
 Exception in thread main java.lang.NoClassDefFoundError:
 at java.lang.Class.getDeclaredMethods0(Native Method)
 at java.lang.Class.privateGetDeclaredMethods(
 at java.lang.Class.getMethod0(
 at java.lang.Class.getMethod(
 Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector
 at Method)
 at java.lang.ClassLoader.loadClass(
 at sun.misc.Launcher$AppClassLoader.loadClass(
 at java.lang.ClassLoader.loadClass(
 ... 6 more
 The jar under
 contains the breeze/linalg/Vector* path, I even tried to unpack it and put
 it in CLASSPATH to it does not seem to pick it up

 I am currently running java 1.8
 java version 1.8.0_05
 Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
 Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)

 What I am doing wrong ?