I ran the SparkKMeans example (not the mllib KMeans that Sean ran) with
your dataset as well, I got the expected answer. And I believe that even
though initialization is done using sampling, the example actually sets the
seed to a constant 42, so the result should always be the same no matter
how many times you run it. So I am not really sure whats going on here.

Can you tell us more about which version of Spark you are running? Which
Java version?


======================================

[tdas @ Xion spark2] cat input
2 1
1 2
3 2
2 3
4 1
5 1
6 1
4 2
6 2
4 3
5 3
6 3
[tdas @ Xion spark2] ./bin/run-example SparkKMeans input 2 0.001
2014-07-10 02:45:06.764 java[45244:d17] Unable to load realm info from
SCDynamicStore
14/07/10 02:45:07 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
14/07/10 02:45:07 WARN LoadSnappy: Snappy native library not loaded
14/07/10 02:45:08 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeSystemBLAS
14/07/10 02:45:08 WARN BLAS: Failed to load implementation from:
com.github.fommil.netlib.NativeRefBLAS
Finished iteration (delta = 3.0)
Finished iteration (delta = 0.0)
Final centers:
DenseVector(5.0, 2.0)
DenseVector(2.0, 2.0)



On Thu, Jul 10, 2014 at 2:17 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote:

> so this is what I am running:
> "./bin/run-example SparkKMeans ~/Documents/2dim2.txt 2 0.001"
>
> And this is the input file:"
> ┌───[spark2013@SparkOne]──────[~/spark-1.0.0].$
> └───#!cat ~/Documents/2dim2.txt
> 2 1
> 1 2
> 3 2
> 2 3
> 4 1
> 5 1
> 6 1
> 4 2
> 6 2
> 4 3
> 5 3
> 6 3
> "
>
> This is the final output from spark:
> "14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> Getting 2 non-empty blocks out of 2 blocks
> 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> Started 0 remote fetches in 0 ms
> 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> maxBytesInFlight: 50331648, targetRequestSize: 10066329
> 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> Getting 2 non-empty blocks out of 2 blocks
> 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator:
> Started 0 remote fetches in 0 ms
> 14/07/10 20:05:12 INFO Executor: Serialized size of result for 14 is 1433
> 14/07/10 20:05:12 INFO Executor: Sending result for 14 directly to driver
> 14/07/10 20:05:12 INFO Executor: Finished task ID 14
> 14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 0)
> 14/07/10 20:05:12 INFO TaskSetManager: Finished TID 14 in 5 ms on
> localhost (progress: 1/2)
> 14/07/10 20:05:12 INFO Executor: Serialized size of result for 15 is 1433
> 14/07/10 20:05:12 INFO Executor: Sending result for 15 directly to driver
> 14/07/10 20:05:12 INFO Executor: Finished task ID 15
> 14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 1)
> 14/07/10 20:05:12 INFO TaskSetManager: Finished TID 15 in 7 ms on
> localhost (progress: 2/2)
> 14/07/10 20:05:12 INFO DAGScheduler: Stage 6 (collectAsMap at
> SparkKMeans.scala:75) finished in 0.008 s
> 14/07/10 20:05:12 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks
> have all completed, from pool
> 14/07/10 20:05:12 INFO SparkContext: Job finished: collectAsMap at
> SparkKMeans.scala:75, took 0.02472681 s
> Finished iteration (delta = 0.0)
> Final centers:
> DenseVector(2.8571428571428568, 2.0)
> DenseVector(5.6000000000000005, 2.0)
> "
>
>
>
>
>   On Thursday, July 10, 2014 12:02 PM, Bertrand Dechoux <
> decho...@gmail.com> wrote:
>
>
> A picture is worth a thousand... Well, a picture with this dataset, what
> you are expecting and what you get, would help answering your initial
> question.
>
> Bertrand
>
>
> On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk <wanda_haw...@yahoo.com>
> wrote:
>
> Can someone please run the standard kMeans code on this input with 2
> centers ?:
> 2 1
> 1 2
> 3 2
> 2 3
> 4 1
> 5 1
> 6 1
> 4 2
> 6 2
> 4 3
> 5 3
> 6 3
>
> The obvious result should be (2,2) and (5,2) ... (you can draw them if you
> don't believe me ...)
>
> Thanks,
>  Wanda
>
>
>
>
>

Reply via email to