Suneel is right. :)

Let me explain how this came up:
- When clustering, and assigning a point to a cluster, the centroid needs
to be updated.
- To update the centroid in the nearest neighbor searcher classes, the
centroid must first be removed.
- To remove the centroid, we get the closest vector (search for it, and it
should be itself) and then remove it from the data structures.
=> However, when the centroid is 0, the nearest vector (which should be
itself) has a huge distance (1 rather than 0) and this trips a check.


On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote:

> It sounds pretty undefined, but I would tend to define the distance as
> 0 in this case of course. And that means defining the cosine as 1.
> Which class in particular? There are a few implementations of this
> distance measure.
>
> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <dangeorge.fili...@gmail.com>
> wrote:
> > In the case where both vectors are all zeros, the angle between them is
> 0,
> > so the cosine is therefore 1 and the so the distance returned should be 0
> > (unless I misunderstood what the distance does).
> >
> > In Mahout, when calling distance() however, if both the denominator and
> > dotProduct are 0 (which is true when both vectors are 0), the returned
> > value is 1.
> >
> > This looks like a bug to me and I would open a JIRA issue and fix it but
> I
> > want to make sure there's nothing I could possibly be missing.
> >
> > Thoughts?
>

Reply via email to