Re: Clustering from DB

nfantone Mon, 27 Jul 2009 11:34:05 -0700

> Well, it does matter to some degree since picking random vectors tends to 
> give you dense vectors whereas text gives you very sparse vectors.


> Different patterns of sparsity can cause radically different time complexity
for the clustering.

I have yet to find a random combination of vectors that actually
benefits substantially the performance of kMeans. I have also tried
real datasets (like the one I was initially using from large amounts
of data defining consumer's buying habits) to no avail. How should a
collection of vectors be created to, say, not compromise the algorithm
functionality significantly?

Re: Clustering from DB

Reply via email to