I am running spark-1.0.0 with java 1.8

"java version "1.8.0_05"
Java(TM) SE Runtime Environment (build 1.8.0_05-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.5-b02, mixed mode)"

"which spark-shell
~/bench/spark-1.0.0/bin/spark-shell"

"which scala
~/bench/scala-2.10.4/bin/scala"


On Thursday, July 10, 2014 12:46 PM, Tathagata Das 
<tathagata.das1...@gmail.com> wrote:
 


I ran the SparkKMeans example (not the mllib KMeans that Sean ran) with your 
dataset as well, I got the expected answer. And I believe that even though 
initialization is done using sampling, the example actually sets the seed to a 
constant 42, so the result should always be the same no matter how many times 
you run it. So I am not really sure whats going on here.

Can you tell us more about which version of Spark you are running? Which Java 
version? 


======================================

[tdas @ Xion spark2] cat input
2 1
1 2
3 2
2 3
4 1
5 1
6 1
4 2
6 2
4 3
5 3
6 3
[tdas @ Xion spark2] ./bin/run-example SparkKMeans input 2 0.001
2014-07-10 02:45:06.764 java[45244:d17] Unable to load realm info from 
SCDynamicStore
14/07/10 02:45:07 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
14/07/10 02:45:07 WARN LoadSnappy: Snappy native library not loaded
14/07/10 02:45:08 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeSystemBLAS
14/07/10 02:45:08 WARN BLAS: Failed to load implementation from: 
com.github.fommil.netlib.NativeRefBLAS
Finished iteration (delta = 3.0)
Finished iteration (delta = 0.0)
Final centers:
DenseVector(5.0, 2.0)
DenseVector(2.0, 2.0)




On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote:

so this is what I am running: 
>"./bin/run-example SparkKMeans ~/Documents/2dim2.txt 2 0.001"
>
>
>And this is the input file:"
>┌───[spark2013@SparkOne]──────[~/spark-1.0.0].$
>└───#!cat ~/Documents/2dim2.txt
>2 1
>1 2
>3 2
>2 3
>4 1
>5 1
>6 1
>4 2
>6 2
>4 3
>5 3
>6 3
>"
>
>
>This is the final output from spark:
>"14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
>Getting 2 non-empty blocks out of 2 blocks
>14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 
>0 remote fetches in 0 ms
>14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: 
>maxBytesInFlight: 50331648, targetRequestSize: 10066329
>14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 
>2 non-empty blocks out of 2 blocks
>14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 
>0 remote fetches in 0 ms
>14/07/10 20:05:12 INFO Executor: Serialized size of result for 14 is 1433
>14/07/10 20:05:12 INFO Executor: Sending result for 14 directly to driver
>14/07/10 20:05:12 INFO Executor: Finished task ID 14
>14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 0)
>14/07/10 20:05:12 INFO TaskSetManager: Finished TID 14 in 5 ms on localhost 
>(progress: 1/2)
>14/07/10 20:05:12 INFO Executor: Serialized size of result for 15 is 1433
>14/07/10 20:05:12 INFO Executor: Sending result for 15 directly to driver
>14/07/10 20:05:12 INFO Executor: Finished task ID 15
>14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 1)
>14/07/10 20:05:12 INFO TaskSetManager: Finished TID 15 in 7 ms on localhost 
>(progress: 2/2)
>14/07/10 20:05:12 INFO DAGScheduler: Stage 6 (collectAsMap at 
>SparkKMeans.scala:75) finished in 0.008 s
>14/07/10 20:05:12 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks 
>have all completed, from pool
>14/07/10 20:05:12 INFO SparkContext: Job finished: collectAsMap at 
>SparkKMeans.scala:75, took 0.02472681 s
>Finished iteration (delta = 0.0)
>Final centers:
>DenseVector(2.8571428571428568, 2.0)
>DenseVector(5.6000000000000005, 2.0)
>"
>
>
>
>
>
>
>
>On Thursday, July 10, 2014 12:02 PM, Bertrand Dechoux <decho...@gmail.com> 
>wrote:
> 
>
>
>A picture is worth a thousand... Well, a picture with this dataset, what you 
>are expecting and what you get, would help answering your initial question.
>
>
>Bertrand
>
>
>On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote:
>
>Can someone please run the standard kMeans code on this input with 2 centers ?:
>>2 1
>>1 2
>>3 2
>>2 3
>>4 1
>>5 1
>>6 1
>>4 2
>>6 2
>>4 3
>>5 3
>>6 3
>>
>>
>>The obvious result should be (2,2) and (5,2) ... (you can draw them if you 
>>don't believe me ...)
>>
>>
>>Thanks, 
>>Wanda
>
>
>

Reply via email to