On Wed, Jan 27, 2010 at 10:33 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Yes. That is correct. > > I had wanted vectors to cache this squared length, but the consensus was > that the caller could do it more correctly. > Well, vectors *are* caching their squared length (see MAHOUT-208), for one thing. For another thing, the implementation of this optimization looks completely wrong, in SquaredEuclideanDistanceMeasure (and hence also in EuclideanDistanceMeasure) : @Override public double distance(Vector v1, Vector v2) { if (v1.size() != v2.size()) { throw new CardinalityException(); } Vector vector = v1.minus(v2); return vector.dot(vector); } So far so good - not terribly optimized here, because a new vector is created, but mathematically correct (how is this different then just doing v1.getDistanceSquared(v2), however?). But then what is this trying to say: @Override public double distance(double centroidLengthSquare, Vector centroid, Vector v) { if (centroid.size() != v.size()) { throw new CardinalityException(); } return centroidLengthSquare + v.getDistanceSquared(centroid); } If you're calling this as distance(centroid.getLengthSquared(), centroid, v), then this resolves to centroid.getLengthSquared() + v.getDistanceSquared(centroid) why is this supposed to be equal to centroid.minus(v).getLengthSquared()? -jake