I have given this a try in a spark-shell and I still get many "Allocation 
Failure"s


On Thursday, July 3, 2014 9:51 AM, Xiangrui Meng <men...@gmail.com> wrote:
 


The SparkKMeans is just an example code showing a barebone
implementation of k-means. To run k-means on big datasets, please use
the KMeans implemented in MLlib directly:
http://spark.apache.org/docs/latest/mllib-clustering.html

-Xiangrui


On Wed, Jul 2, 2014 at 9:50 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote:
> I can run it now with the suggested method. However, I have encountered a
> new problem that I have not faced before (sent another email with that one
> but here it goes again ...)
>
> I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with
> spark-0.8.0 with this line in bash.rc " export _JAVA_OPTIONS="-Xmx15g
> -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails" ". It
> finished in a decent time, ~50 seconds, and I had only a few "Full GC...."
> messages from Java. (a max of 4-5)
>
> Now, using the same export in bash.rc but with spark-1.0.0  (and running it
> with spark-submit) the first loop never finishes and  I get a lot of:
> "18.537: [GC (Allocation Failure) --[PSYoungGen:
> 11796992K->11796992K(13762560K)] 11797442K->11797450K(13763072K), 2.8420311
> secs] [Times: user=5.81 sys=2.12, real=2.85 secs]
> "
> or
>
>  "31.867: [Full GC (Ergonomics) [PSYoungGen: 11796992K->3177967K(13762560K)]
> [ParOldGen: 505K->505K(512K)] 11797497K->3178473K(13763072K), [Metaspace:
> 37646K->37646K(1081344K)], 2.3053283 secs] [Times: user=37.74 sys=0.11,
> real=2.31 secs]"
>
> I tried passing different parameters for the JVM through spark-submit, but
> the results are the same
> This happens with java 1.7 and also with java 1.8.
> I do not know what the "Ergonomics" stands for ...
>
> How can I get a decent performance from spark-1.0.0 considering that
> spark-0.8.0 did not need any fine tuning on the gargage collection method
> (the default worked well) ?
>
> Thank you
>
>
> On Wednesday, July 2, 2014 4:45 PM, Yana Kadiyska <yana.kadiy...@gmail.com>
> wrote:
>
>
> The scripts that Xiangrui mentions set up the classpath...Can you run
> ./run-example for the provided example sucessfully?
>
> What you can try is set SPARK_PRINT_LAUNCH_COMMAND=1 and then call
> run-example -- that will show you the exact java command used to run
> the example at the start of execution. Assuming you can run examples
> succesfully, you should be able to just copy that and add your jar to
> the front of the classpath. If that works you can start removing extra
> jars (run-examples put all the example jars in the cp, which you won't
> need)
>
> As you said the error you see is indicative of the class not being
> available/seen at runtime but it's hard to tell why.
>
> On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote:
>> I want to make some minor modifications in the SparkMeans.scala so running
>> the basic example won't do.
>> I have also packed my code under a "jar" file with sbt. It completes
>> successfully but when I try to run it : "java -jar myjar.jar" I get the
>> same
>> error:
>> "Exception in thread "main" java.lang.NoClassDefFoundError:
>> breeze/linalg/Vector
>>        at java.lang.Class.getDeclaredMethods0(Native Method)
>>        at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
>>        at java.lang.Class.getMethod0(Class.java:2774)
>>        at java.lang.Class.getMethod(Class.java:1663)
>>        at
>> sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>>        at
>> sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
>> "
>>
>> If "scalac -d classes/ SparkKMeans.scala" can't see my classpath, why does
>> it succeeds in compiling and does not give the same error ?
>> The error itself "NoClassDefFoundError" means that the files are available
>> at compile time, but for some reason I cannot figure out they are not
>> available at run time. Does anyone know why ?
>>
>> Thank you
>>
>>
>> On Tuesday, July 1, 2014 7:03 PM, Xiangrui Meng <men...@gmail.com> wrote:
>>
>>
>> You can use either bin/run-example or bin/spark-summit to run example
>> code. "scalac -d classes/ SparkKMeans.scala" doesn't recognize Spark
>> classpath. There are examples in the official doc:
>> http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here
>> -Xiangrui
>>
>> On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote:
>>> Hello,
>>>
>>> I have installed spark-1.0.0 with scala2.10.3. I have built spark with
>>> "sbt/sbt assembly" and added
>>>
>>>
>>> "/home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar"
>>> to my CLASSPATH variable.
>>> Then I went here
>>> "../spark-1.0.0/examples/src/main/scala/org/apache/spark/examples"
>>> created
>>> a
>>> new directory "classes" and compiled SparkKMeans.scala with "scalac -d
>>> classes/ SparkKMeans.scala"
>>> Then I navigated to "classes" (I commented this line in the scala file :
>>> package org.apache.spark.examples ) and tried to run it with "java -cp .
>>> SparkKMeans" and I get the following error:
>>> "Exception in thread "main" java.lang.NoClassDefFoundError:
>>> breeze/linalg/Vector
>>>        at java.lang.Class.getDeclaredMethods0(Native Method)
>>>        at java.lang.Class.privateGetDeclaredMethods(Class.java:2531)
>>>        at java.lang.Class.getMethod0(Class.java:2774)
>>>        at java.lang.Class.getMethod(Class.java:1663)
>>>        at
>>> sun.launcher.LauncherHelper.getMainMethod(LauncherHelper.java:494)
>>>        at
>>> sun.launcher.LauncherHelper.checkAndLoadMain(LauncherHelper.java:486)
>>> Caused by: java.lang.ClassNotFoundException: breeze.linalg.Vector
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>>>        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>>>        at java.security.AccessController.doPrivileged(Native Method)
>>>        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>>>        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>>>        ... 6 more
>>> "
>>> The jar under
>>>
>>>
>>> "/home/wanda/spark-1.0.0/assembly/target/scala-2.10/spark-assembly-1.0.0-hadoop1.0.4.jar"
>>> contains the breeze/linalg/Vector* path, I even tried to unpack it and
>>> put
>>> it in CLASSPATH to it does not seem to pick it up
>>>
>>>
>>> I am currently running java 1.8
>>> "java version "1.8.0_05"
>>> Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
>>> Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)"
>>>
>>> What I am doing wrong ?
>>>
>>
>>
>
>

Reply via email to