> Well, it does matter to some degree since picking random vectors tends to 
> give you dense vectors whereas text gives you very sparse vectors.

> Different patterns of sparsity can cause radically different time complexity
for the clustering.

I have yet to find a random combination of vectors that actually
benefits substantially the performance of kMeans. I have also tried
real datasets (like the one I was initially using from large amounts
of data defining consumer's buying habits) to no avail. How should a
collection of vectors be created to, say, not compromise the algorithm
functionality significantly?

Reply via email to