On Wed, Jan 27, 2010 at 10:33 AM, Ted Dunning <ted.dunn...@gmail.com> wrote:

> Yes.  That is correct.
>
> I had wanted vectors to cache this squared length, but the consensus was
> that the caller could do it more correctly.
>

Well, vectors *are* caching their squared length (see MAHOUT-208), for one
thing.

For another thing, the implementation of this optimization looks completely
wrong,
in SquaredEuclideanDistanceMeasure (and hence also in
EuclideanDistanceMeasure) :


  @Override
  public double distance(Vector v1, Vector v2) {
    if (v1.size() != v2.size()) {
      throw new CardinalityException();
    }
    Vector vector = v1.minus(v2);
    return vector.dot(vector);
  }

So far so good - not terribly optimized here, because a new vector is
created, but
mathematically correct (how is this different then just doing
v1.getDistanceSquared(v2),
however?).

But then what is this trying to say:

  @Override
  public double distance(double centroidLengthSquare, Vector centroid,
Vector v) {
    if (centroid.size() != v.size()) {
      throw new CardinalityException();
    }
    return centroidLengthSquare + v.getDistanceSquared(centroid);
  }

If you're calling this as distance(centroid.getLengthSquared(), centroid,
v), then
this resolves to

  centroid.getLengthSquared() + v.getDistanceSquared(centroid)

why is this supposed to be equal to

  centroid.minus(v).getLengthSquared()?

  -jake

Reply via email to