so this is what I am running: "./bin/run-example SparkKMeans ~/Documents/2dim2.txt 2 0.001"
And this is the input file:" ┌───[spark2013@SparkOne]──────[~/spark-1.0.0].$ └───#!cat ~/Documents/2dim2.txt 2 1 1 2 3 2 2 3 4 1 5 1 6 1 4 2 6 2 4 3 5 3 6 3 " This is the final output from spark: "14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: maxBytesInFlight: 50331648, targetRequestSize: 10066329 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Getting 2 non-empty blocks out of 2 blocks 14/07/10 20:05:12 INFO BlockFetcherIterator$BasicBlockFetcherIterator: Started 0 remote fetches in 0 ms 14/07/10 20:05:12 INFO Executor: Serialized size of result for 14 is 1433 14/07/10 20:05:12 INFO Executor: Sending result for 14 directly to driver 14/07/10 20:05:12 INFO Executor: Finished task ID 14 14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 0) 14/07/10 20:05:12 INFO TaskSetManager: Finished TID 14 in 5 ms on localhost (progress: 1/2) 14/07/10 20:05:12 INFO Executor: Serialized size of result for 15 is 1433 14/07/10 20:05:12 INFO Executor: Sending result for 15 directly to driver 14/07/10 20:05:12 INFO Executor: Finished task ID 15 14/07/10 20:05:12 INFO DAGScheduler: Completed ResultTask(6, 1) 14/07/10 20:05:12 INFO TaskSetManager: Finished TID 15 in 7 ms on localhost (progress: 2/2) 14/07/10 20:05:12 INFO DAGScheduler: Stage 6 (collectAsMap at SparkKMeans.scala:75) finished in 0.008 s 14/07/10 20:05:12 INFO TaskSchedulerImpl: Removed TaskSet 6.0, whose tasks have all completed, from pool 14/07/10 20:05:12 INFO SparkContext: Job finished: collectAsMap at SparkKMeans.scala:75, took 0.02472681 s Finished iteration (delta = 0.0) Final centers: DenseVector(2.8571428571428568, 2.0) DenseVector(5.6000000000000005, 2.0) " On Thursday, July 10, 2014 12:02 PM, Bertrand Dechoux <decho...@gmail.com> wrote: A picture is worth a thousand... Well, a picture with this dataset, what you are expecting and what you get, would help answering your initial question. Bertrand On Thu, Jul 10, 2014 at 10:44 AM, Wanda Hawk <wanda_haw...@yahoo.com> wrote: Can someone please run the standard kMeans code on this input with 2 centers ?: >2 1 >1 2 >3 2 >2 3 >4 1 >5 1 >6 1 >4 2 >6 2 >4 3 >5 3 >6 3 > > >The obvious result should be (2,2) and (5,2) ... (you can draw them if you >don't believe me ...) > > >Thanks, >Wanda