I see the arguments for having it defined, just raising the point that it's
a very strange spot to be in.

If all users are zero except for one person who likes the lentil soup, then
the other users are equally different from that person.

The problem for me is the discontinuity Sean mentions, where at zero you go
off a cliff and have no sense of distance.

But for convenience and "behaving nicely" I'm fine with distance between
zero vectors being zero.


On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote:

> While I agree that it's fairly meaningless mathematically, this ensures
> that the distance between two vectors that are the same is 0 always holds.
> Think of yourself using this class through the DistanceMeasure interface.
> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
>
> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
>
>
> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
> andrew.mussel...@gmail.com> wrote:
>
> > I think it should return an "undefined" symbol.  There is no angle
> between
> > two zero vectors.
> >
> > In a practical sense, taking two zero vectors to be equivalent in the
> > context of user-item vectors, say, is dodgy in my opinion.  That is akin
> to
> > saying "If we both hate everything on this restaurant's menu we are the
> > same person."
> >
> >
> > On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
> dangeorge.fili...@gmail.com
> > >wrote:
> >
> > > Suneel is right. :)
> > >
> > > Let me explain how this came up:
> > > - When clustering, and assigning a point to a cluster, the centroid
> needs
> > > to be updated.
> > > - To update the centroid in the nearest neighbor searcher classes, the
> > > centroid must first be removed.
> > > - To remove the centroid, we get the closest vector (search for it, and
> > it
> > > should be itself) and then remove it from the data structures.
> > > => However, when the centroid is 0, the nearest vector (which should be
> > > itself) has a huge distance (1 rather than 0) and this trips a check.
> > >
> > >
> > > On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote:
> > >
> > > > It sounds pretty undefined, but I would tend to define the distance
> as
> > > > 0 in this case of course. And that means defining the cosine as 1.
> > > > Which class in particular? There are a few implementations of this
> > > > distance measure.
> > > >
> > > > On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
> > dangeorge.fili...@gmail.com
> > > >
> > > > wrote:
> > > > > In the case where both vectors are all zeros, the angle between
> them
> > is
> > > > 0,
> > > > > so the cosine is therefore 1 and the so the distance returned
> should
> > > be 0
> > > > > (unless I misunderstood what the distance does).
> > > > >
> > > > > In Mahout, when calling distance() however, if both the denominator
> > and
> > > > > dotProduct are 0 (which is true when both vectors are 0), the
> > returned
> > > > > value is 1.
> > > > >
> > > > > This looks like a bug to me and I would open a JIRA issue and fix
> it
> > > but
> > > > I
> > > > > want to make sure there's nothing I could possibly be missing.
> > > > >
> > > > > Thoughts?
> > > >
> > >
> >
>

Reply via email to