I've noticed (and occasionally been bothered by) this too, so I just decided 
to fix it:
https://github.com/JuliaStats/Clustering.jl/pull/35
https://github.com/JuliaStats/Distances.jl/pull/9

It seems that kmeans was optimized for very high dimensions, but performed 
poorly on low-dimensional data. We'll see what the reaction is.

Even with these I have a sense one could do yet better, but this is at least a 
start.

Best,
--Tim


On Sunday, January 25, 2015 09:57:24 AM Martin Kapfhammer wrote:
> using DataFrames
> using Clustering
> 
> 
> raw_data = readtable("10000000x2s10.csv", header=false,
> eltypes=[Float64,Float64])
> 
> matrix = transpose(array(raw_data))
> 
> k = 3
> 
> for i = 1:10
>     print("new round ")
>     println(i)
> 
>     #@time measuring
>     @time result = kmeans(matrix, k)
>     print("totalcost ")
>     println(result.totalcost)
>     print("iterations ")
>     println(result.iterations)
>     print("converged ")
>     println(result.converged)
> end
> 
> 
> println("test run done")

Reply via email to