I was trying out SeqAccessSparseVector on Canopy Clustering using Manhattan
distance. I found performance to be really bad. So I profiled it with
Yourkit(Thanks a lot for providing us free license)

Since i was trying out manhattan distance, there were a lot of A-B which
created a lot of clone operation 5% of the total time
there were also so many A+B for adding a point to the canopy to average.
this was also creating a lot of clone operations.  90% of the total time

So we definitely needs to improve that..

For a small hack. I made the cluster centers RandomAccess Vector. Things are
fast again. I dont know whether to commit or not. But something to look into
in 0.4?

Robin

Reply via email to