All of this doesn't normally matter when cosine distance is used since
usually it is used with normalized vectors.  For that set of vectors it is
a measure.


On Thu, Apr 4, 2013 at 11:25 PM, Andrew Musselman <
andrew.mussel...@gmail.com> wrote:

> I agree 1 is wrong :)
>
>
> On Thu, Apr 4, 2013 at 2:22 PM, Dan Filimon <dangeorge.fili...@gmail.com
> >wrote:
>
> > Ah, okay then. :)
> > I thought that you depend on the current convention that it returns 1.
> So,
> > disclaimers aside, you're fine with the change?
> >
> >
> > On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter <
> > ssc.o...@googlemail.com
> > > wrote:
> >
> > > You can ignore the recommender stuff for the DistanceMeasure classes,
> as
> > > the recommenders use their own distance/similarity implementations.
> > >
> > > I justed wanted to comment on the example that Andrew gave, to mention
> > > that there are some common pitfalls with modeling ratings/interactions.
> > >
> > > On 04.04.2013 23:14, Dan Filimon wrote:
> > > > Right, that's fair. So, you're saying there needs to be a special
> value
> > > > when both vectors are 0 for the recommender system to work?
> > > > And that 0 means dislike, which isn't in fact accurate. You want to
> > > convey
> > > > lack of information.
> > > >
> > > > But now, the code returns 1. Is that a special value? I'd guess it
> > means
> > > > you like it by default...?
> > > >
> > > >
> > > > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <
> > > ssc.o...@googlemail.com
> > > >> wrote:
> > > >
> > > >> In recommender systems, it's dangerous to interpret "no interaction"
> > as
> > > >> dislike. Think of all movies you never watched, do you really
> dislike
> > > >> them all? :)
> > > >>
> > > >>
> > > >> On 04.04.2013 23:03, Andrew Musselman wrote:
> > > >>> I agree; I mis-spoke before if I said "dislike".  Zero to me means
> > > >>> literally nothing.  No interaction.  Which could be either "don't
> > > like",
> > > >>> "don't like today", "dislike", etc.  Which adds to the
> > meaninglessness
> > > of
> > > >>> it.
> > > >>>
> > > >>>
> > > >>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter
> > > >>> <ssc.o...@googlemail.com>wrote:
> > > >>>
> > > >>>> I think that in our recommender code, 0 should mean no rating or
> no
> > > >>>> interaction observed. I think modeling dislike with 0 creates lot
> of
> > > >>>> unnecessary problems.
> > > >>>>
> > > >>>> On 04.04.2013 22:56, Andrew Musselman wrote:
> > > >>>>> I see the arguments for having it defined, just raising the point
> > > that
> > > >>>> it's
> > > >>>>> a very strange spot to be in.
> > > >>>>>
> > > >>>>> If all users are zero except for one person who likes the lentil
> > > soup,
> > > >>>> then
> > > >>>>> the other users are equally different from that person.
> > > >>>>>
> > > >>>>> The problem for me is the discontinuity Sean mentions, where at
> > zero
> > > >> you
> > > >>>> go
> > > >>>>> off a cliff and have no sense of distance.
> > > >>>>>
> > > >>>>> But for convenience and "behaving nicely" I'm fine with distance
> > > >> between
> > > >>>>> zero vectors being zero.
> > > >>>>>
> > > >>>>>
> > > >>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <
> > > >> dangeorge.fili...@gmail.com
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> While I agree that it's fairly meaningless mathematically, this
> > > >> ensures
> > > >>>>>> that the distance between two vectors that are the same is 0
> > always
> > > >>>> holds.
> > > >>>>>> Think of yourself using this class through the DistanceMeasure
> > > >>>> interface.
> > > >>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
> > > >>>>>>
> > > >>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
> > > >>>>>>
> > > >>>>>>
> > > >>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
> > > >>>>>> andrew.mussel...@gmail.com> wrote:
> > > >>>>>>
> > > >>>>>>> I think it should return an "undefined" symbol.  There is no
> > angle
> > > >>>>>> between
> > > >>>>>>> two zero vectors.
> > > >>>>>>>
> > > >>>>>>> In a practical sense, taking two zero vectors to be equivalent
> in
> > > the
> > > >>>>>>> context of user-item vectors, say, is dodgy in my opinion.
>  That
> > is
> > > >>>> akin
> > > >>>>>> to
> > > >>>>>>> saying "If we both hate everything on this restaurant's menu we
> > are
> > > >> the
> > > >>>>>>> same person."
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
> > > >>>>>> dangeorge.fili...@gmail.com
> > > >>>>>>>> wrote:
> > > >>>>>>>
> > > >>>>>>>> Suneel is right. :)
> > > >>>>>>>>
> > > >>>>>>>> Let me explain how this came up:
> > > >>>>>>>> - When clustering, and assigning a point to a cluster, the
> > > centroid
> > > >>>>>> needs
> > > >>>>>>>> to be updated.
> > > >>>>>>>> - To update the centroid in the nearest neighbor searcher
> > classes,
> > > >> the
> > > >>>>>>>> centroid must first be removed.
> > > >>>>>>>> - To remove the centroid, we get the closest vector (search
> for
> > > it,
> > > >>>> and
> > > >>>>>>> it
> > > >>>>>>>> should be itself) and then remove it from the data structures.
> > > >>>>>>>> => However, when the centroid is 0, the nearest vector (which
> > > should
> > > >>>> be
> > > >>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a
> > > >> check.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com>
> > > wrote:
> > > >>>>>>>>
> > > >>>>>>>>> It sounds pretty undefined, but I would tend to define the
> > > distance
> > > >>>>>> as
> > > >>>>>>>>> 0 in this case of course. And that means defining the cosine
> as
> > > 1.
> > > >>>>>>>>> Which class in particular? There are a few implementations of
> > > this
> > > >>>>>>>>> distance measure.
> > > >>>>>>>>>
> > > >>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
> > > >>>>>>> dangeorge.fili...@gmail.com
> > > >>>>>>>>>
> > > >>>>>>>>> wrote:
> > > >>>>>>>>>> In the case where both vectors are all zeros, the angle
> > between
> > > >>>>>> them
> > > >>>>>>> is
> > > >>>>>>>>> 0,
> > > >>>>>>>>>> so the cosine is therefore 1 and the so the distance
> returned
> > > >>>>>> should
> > > >>>>>>>> be 0
> > > >>>>>>>>>> (unless I misunderstood what the distance does).
> > > >>>>>>>>>>
> > > >>>>>>>>>> In Mahout, when calling distance() however, if both the
> > > >> denominator
> > > >>>>>>> and
> > > >>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0),
> the
> > > >>>>>>> returned
> > > >>>>>>>>>> value is 1.
> > > >>>>>>>>>>
> > > >>>>>>>>>> This looks like a bug to me and I would open a JIRA issue
> and
> > > fix
> > > >>>>>> it
> > > >>>>>>>> but
> > > >>>>>>>>> I
> > > >>>>>>>>>> want to make sure there's nothing I could possibly be
> missing.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Thoughts?
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> > >
> >
>

Reply via email to