Hello
I am trying to reduce the number of java threads (about 80 on my system) to as
few as there can be.
What settings can be done in spark-1.1.0/conf/spark-env.sh ? (or other places
as well)
I am also using hadoop for storing data on hdfs
Thank you,
Wanda
I am trying to get a software trace and I need to get the number of active
threads as low as I can in order to inspect the active part of the workload
From: Prashant Sharma scrapco...@gmail.com
To: Wanda Hawk wanda_haw...@yahoo.com
Cc: user@spark.apache.org
Is this what are you looking for ?
In Shark, default reducer number is 1 and is controlled by the property
mapred.reduce.tasks. Spark SQL deprecates this property in favor
ofspark.sql.shuffle.partitions, whose default value is 200. Users may customize
this property via SET:
SET
with this issue
is to run kmeans multiple times and choose the best answer. You can do this by
changing the runs parameter from the default value (1) to something larger (say
10).
-Ameet
On Fri, Jul 11, 2014 at 1:20 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
I also took a look at
spark-1.0.0
, Wanda Hawk wanda_haw...@yahoo.com wrote:
so this is what I am running:
./bin/run-example SparkKMeans
~/Documents/2dim2.txt 2 0.001
And this is the input file:
┌───[spark2013@SparkOne]──[~/spark-1.0.0].$
└───#!cat ~/Documents/2dim2.txt
2 1
1 2
3 2
2 3
4 1
5 1
6 1
4 2
6 2
4 3
5 3
Can someone please run the standard kMeans code on this input with 2 centers ?:
2 1
1 2
3 2
2 3
4 1
5 1
6 1
4 2
6 2
4 3
5 3
6 3
The obvious result should be (2,2) and (5,2) ... (you can draw them if you
don't believe me ...)
Thanks,
Wanda
:
A picture is worth a thousand... Well, a picture with this dataset, what you
are expecting and what you get, would help answering your initial question.
Bertrand
On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
Can someone please run the standard kMeans code
) or
something to try 10 times instead of once.
On Thu, Jul 10, 2014 at 9:44 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
Can someone please run the standard kMeans code on this input with 2 centers
?:
2 1
1 2
3 2
2 3
4 1
5 1
6 1
4 2
6 2
4 3
5 3
6 3
The obvious result should be (2,2
:
com.github.fommil.netlib.NativeRefBLAS
Finished iteration (delta = 3.0)
Finished iteration (delta = 0.0)
Final centers:
DenseVector(5.0, 2.0)
DenseVector(2.0, 2.0)
On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
so this is what I am running:
./bin/run-example SparkKMeans
like in both cases, your
young generation is quite large (11 GB), which doesn’t make lot of sense with a
heap of 15 GB. But maybe I’m misreading something.
Matei
On Jul 2, 2014, at 4:50 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
I ran SparkKMeans with a big file (~ 7 GB of data) for one
the KMeans implemented in MLlib directly:
http://spark.apache.org/docs/latest/mllib-clustering.html
-Xiangrui
On Wed, Jul 2, 2014 at 9:50 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
I can run it now with the suggested method. However, I have encountered a
new problem that I have not faced before
-summit to run example
code. scalac -d classes/ SparkKMeans.scala doesn't recognize Spark
classpath. There are examples in the official doc:
http://spark.apache.org/docs/latest/quick-start.html#where-to-go-from-here
-Xiangrui
On Tue, Jul 1, 2014 at 4:39 AM, Wanda Hawk wanda_haw...@yahoo.com wrote
Got it ! Ran the jar with spark-submit. Thanks !
On Wednesday, July 2, 2014 9:16 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
I want to make some minor modifications in the SparkMeans.scala so running the
basic example won't do.
I have also packed my code under a jar file with sbt
I ran SparkKMeans with a big file (~ 7 GB of data) for one iteration with
spark-0.8.0 with this line in bash.rc export _JAVA_OPTIONS=-Xmx15g -Xms15g
-verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails . It finished in a
decent time, ~50 seconds, and I had only a few Full GC messages
said the error you see is indicative of the class not being
available/seen at runtime but it's hard to tell why.
On Wed, Jul 2, 2014 at 2:13 AM, Wanda Hawk wanda_haw...@yahoo.com wrote:
I want to make some minor modifications in the SparkMeans.scala so running
the basic example won't do.
I have
15 matches
Mail list logo