On Jul 27, 2009, at 12:03 PM, Ted Dunning wrote:
Yes.
That explains why Jeff didn't see the slow down with dense vectors.
Not following. The distance calc stuff is irrespective of the type of
Vector. I was referring to the centroid length square (I think you
called it the triangle inequality) stuff that Shashikant added on
MAHOUT-121. We use it for testing convergence, but not for other
distance calculations. I haven't looked to see if it is applicable
yet, but it seems like it should be.
On Mon, Jul 27, 2009 at 8:03 AM, Grant Ingersoll
<[email protected]>wrote:
Hmm, some profiling shows the pain is in the distance calculation for
emitPointToNearestCluster. Seems that we only use the optimized
distance
calculations for testing convergence, but shouldn't we also use it
for
calculating the distances to the cluster, too?
--
Ted Dunning, CTO
DeepDyve
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search