Spark SQL reduce number of java threads

2014-10-28 Thread Wanda Hawk
Hello I am trying to reduce the number of java threads (about 80 on my system) to as few as there can be. What settings can be done in spark-1.1.0/conf/spark-env.sh ? (or other places as well) I am also using hadoop for storing data on hdfs Thank you, Wanda

Re: Spark SQL reduce number of java threads

2014-10-28 Thread Wanda Hawk
I am trying to get a software trace and I need to get the number of active threads as low as I can in order to inspect the active part of the workload From: Prashant Sharma scrapco...@gmail.com To: Wanda Hawk wanda_haw...@yahoo.com Cc: user@spark.apache.org

Re: How can number of partitions be set in spark-env.sh?

2014-10-28 Thread Wanda Hawk
Is this what are you looking for ? In Shark, default reducer number is 1 and is controlled by the property mapred.reduce.tasks. Spark SQL deprecates this property in favor ofspark.sql.shuffle.partitions, whose default value is 200. Users may customize this property via SET: SET

Re: KMeans code is rubbish

2014-07-14 Thread Wanda Hawk
with this issue is to run kmeans multiple times and choose the best answer.  You can do this by changing the runs parameter from the default value (1) to something larger (say 10). -Ameet On Fri, Jul 11, 2014 at 1:20 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I also took a look at  spark-1.0.0

Re: KMeans code is rubbish

2014-07-11 Thread Wanda Hawk
, Wanda Hawk wanda_haw...@yahoo.com wrote: so this is what I am running: ./bin/run-example SparkKMeans ~/Documents/2dim2.txt 2 0.001 And this is the input file: ┌───[spark2013@SparkOne]──[~/spark-1.0.0].$ └───#!cat ~/Documents/2dim2.txt 2 1 1 2 3 2 2 3 4 1 5 1 6 1 4 2 6 2 4 3 5 3

KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
Can someone please run the standard kMeans code on this input with 2 centers ?: 2 1 1 2 3 2 2 3 4 1 5 1 6 1 4 2 6 2 4 3 5 3 6 3 The obvious result should be (2,2) and (5,2) ... (you can draw them if you don't believe me ...) Thanks,  Wanda

Re: KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
: A picture is worth a thousand... Well, a picture with this dataset, what you are expecting and what you get, would help answering your initial question. Bertrand On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Can someone please run the standard kMeans code

Re: KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
) or something to try 10 times instead of once. On Thu, Jul 10, 2014 at 9:44 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: Can someone please run the standard kMeans code on this input with 2 centers ?: 2 1 1 2 3 2 2 3 4 1 5 1 6 1 4 2 6 2 4 3 5 3 6 3 The obvious result should be (2,2

Re: KMeans code is rubbish

2014-07-10 Thread Wanda Hawk
: com.github.fommil.netlib.NativeRefBLAS Finished iteration (delta = 3.0) Finished iteration (delta = 0.0) Final centers: DenseVector(5.0, 2.0) DenseVector(2.0, 2.0) On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: so this is what I am running:  ./bin/run-example SparkKMeans

Re: java options for spark-1.0.0

2014-07-03 Thread Wanda Hawk
like in both cases, your young generation is quite large (11 GB), which doesn’t make lot of sense with a heap of 15 GB. But maybe I’m misreading something. Matei On Jul 2, 2014, at 4:50 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I ran SparkKMeans with a big file (~ 7 GB of data) for one

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-03 Thread Wanda Hawk
the KMeans implemented in MLlib directly: http://spark.apache.org/docs/latest/mllib-clustering.html -Xiangrui On Wed, Jul 2, 2014 at 9:50 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I can run it now with the suggested method. However, I have encountered a new problem that I have not faced before

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Wanda Hawk
-summit to run example code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark classpath. There are examples in the official doc: http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here -Xiangrui On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Wanda Hawk
Got it ! Ran the jar with spark-submit. Thanks ! On Wednesday, July 2, 2014 9:16 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do.  I have also packed my code under a jar file with sbt

java options for spark-1.0.0

2014-07-02 Thread Wanda Hawk
I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with spark-0.8.0 with this line in bash.rc export _JAVA_OPTIONS=-Xmx15g -Xms15g -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails . It finished in a decent time, ~50 seconds, and I had only a few Full GC messages

Re: SparkKMeans.scala from examples will show: NoClassDefFoundError: breeze/linalg/Vector

2014-07-02 Thread Wanda Hawk
said the error you see is indicative of the class not being available/seen at runtime but it's hard to tell why. On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wanda_haw...@yahoo.com wrote: I want to make some minor modifications in the SparkMeans.scala so running the basic example won't do. I have