Jake, The distance optimization was done in MAHOUT-121. http://issues.apache.org/jira/browse/MAHOUT-121
The idea is described neatly on LingPipe blog http://lingpipe-blog.com/2009/03/12/speeding-up-k-means-clustering-algebra-sparse-vectors/ I will go through the conversation between you and Ted, and chip in wherever needed. --shashi On Wed, Jan 27, 2010 at 11:42 PM, Jake Mannix <jake.man...@gmail.com> wrote: > The interface defines two methods: > > > double distance(Vector v1, Vector v2); > double distance(double centroidLengthSquare, Vector centroid, Vector v); > > > With the latter being an optimized form of the former, and satisfies: > > distance(v1, v2) == distance(v1.getLengthSquared(), v1, v2) > > Is this correct? Every place I see this method called, it is used in this > fashion, at least... > > -jake >