Re: Clustering from DB

Grant Ingersoll Mon, 27 Jul 2009 11:38:50 -0700

I think the bigger issue here is we are doing extra work to calculatedistance. I'd suggest hanging on a few days to see if we can get thatstraightened out.


On Jul 27, 2009, at 2:33 PM, nfantone wrote:

Well, it does matter to some degree since picking random vectorstends to give you dense vectors whereas text gives you very sparsevectors.
Different patterns of sparsity can cause radically different timecomplexity
for the clustering.

I have yet to find a random combination of vectors that actually
benefits substantially the performance of kMeans. I have also tried
real datasets (like the one I was initially using from large amounts
of data defining consumer's buying habits) to no avail. How should a
collection of vectors be created to, say, not compromise the algorithm
functionality significantly?


--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)using Solr/Lucene:

http://www.lucidimagination.com/search

Re: Clustering from DB

Reply via email to