Ah, okay then. :)
I thought that you depend on the current convention that it returns 1. So,
disclaimers aside, you're fine with the change?


On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter <ssc.o...@googlemail.com
> wrote:

> You can ignore the recommender stuff for the DistanceMeasure classes, as
> the recommenders use their own distance/similarity implementations.
>
> I justed wanted to comment on the example that Andrew gave, to mention
> that there are some common pitfalls with modeling ratings/interactions.
>
> On 04.04.2013 23:14, Dan Filimon wrote:
> > Right, that's fair. So, you're saying there needs to be a special value
> > when both vectors are 0 for the recommender system to work?
> > And that 0 means dislike, which isn't in fact accurate. You want to
> convey
> > lack of information.
> >
> > But now, the code returns 1. Is that a special value? I'd guess it means
> > you like it by default...?
> >
> >
> > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <
> ssc.o...@googlemail.com
> >> wrote:
> >
> >> In recommender systems, it's dangerous to interpret "no interaction" as
> >> dislike. Think of all movies you never watched, do you really dislike
> >> them all? :)
> >>
> >>
> >> On 04.04.2013 23:03, Andrew Musselman wrote:
> >>> I agree; I mis-spoke before if I said "dislike".  Zero to me means
> >>> literally nothing.  No interaction.  Which could be either "don't
> like",
> >>> "don't like today", "dislike", etc.  Which adds to the meaninglessness
> of
> >>> it.
> >>>
> >>>
> >>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter
> >>> <ssc.o...@googlemail.com>wrote:
> >>>
> >>>> I think that in our recommender code, 0 should mean no rating or no
> >>>> interaction observed. I think modeling dislike with 0 creates lot of
> >>>> unnecessary problems.
> >>>>
> >>>> On 04.04.2013 22:56, Andrew Musselman wrote:
> >>>>> I see the arguments for having it defined, just raising the point
> that
> >>>> it's
> >>>>> a very strange spot to be in.
> >>>>>
> >>>>> If all users are zero except for one person who likes the lentil
> soup,
> >>>> then
> >>>>> the other users are equally different from that person.
> >>>>>
> >>>>> The problem for me is the discontinuity Sean mentions, where at zero
> >> you
> >>>> go
> >>>>> off a cliff and have no sense of distance.
> >>>>>
> >>>>> But for convenience and "behaving nicely" I'm fine with distance
> >> between
> >>>>> zero vectors being zero.
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <
> >> dangeorge.fili...@gmail.com
> >>>>> wrote:
> >>>>>
> >>>>>> While I agree that it's fairly meaningless mathematically, this
> >> ensures
> >>>>>> that the distance between two vectors that are the same is 0 always
> >>>> holds.
> >>>>>> Think of yourself using this class through the DistanceMeasure
> >>>> interface.
> >>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
> >>>>>>
> >>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
> >>>>>> andrew.mussel...@gmail.com> wrote:
> >>>>>>
> >>>>>>> I think it should return an "undefined" symbol.  There is no angle
> >>>>>> between
> >>>>>>> two zero vectors.
> >>>>>>>
> >>>>>>> In a practical sense, taking two zero vectors to be equivalent in
> the
> >>>>>>> context of user-item vectors, say, is dodgy in my opinion.  That is
> >>>> akin
> >>>>>> to
> >>>>>>> saying "If we both hate everything on this restaurant's menu we are
> >> the
> >>>>>>> same person."
> >>>>>>>
> >>>>>>>
> >>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
> >>>>>> dangeorge.fili...@gmail.com
> >>>>>>>> wrote:
> >>>>>>>
> >>>>>>>> Suneel is right. :)
> >>>>>>>>
> >>>>>>>> Let me explain how this came up:
> >>>>>>>> - When clustering, and assigning a point to a cluster, the
> centroid
> >>>>>> needs
> >>>>>>>> to be updated.
> >>>>>>>> - To update the centroid in the nearest neighbor searcher classes,
> >> the
> >>>>>>>> centroid must first be removed.
> >>>>>>>> - To remove the centroid, we get the closest vector (search for
> it,
> >>>> and
> >>>>>>> it
> >>>>>>>> should be itself) and then remove it from the data structures.
> >>>>>>>> => However, when the centroid is 0, the nearest vector (which
> should
> >>>> be
> >>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a
> >> check.
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com>
> wrote:
> >>>>>>>>
> >>>>>>>>> It sounds pretty undefined, but I would tend to define the
> distance
> >>>>>> as
> >>>>>>>>> 0 in this case of course. And that means defining the cosine as
> 1.
> >>>>>>>>> Which class in particular? There are a few implementations of
> this
> >>>>>>>>> distance measure.
> >>>>>>>>>
> >>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
> >>>>>>> dangeorge.fili...@gmail.com
> >>>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>> In the case where both vectors are all zeros, the angle between
> >>>>>> them
> >>>>>>> is
> >>>>>>>>> 0,
> >>>>>>>>>> so the cosine is therefore 1 and the so the distance returned
> >>>>>> should
> >>>>>>>> be 0
> >>>>>>>>>> (unless I misunderstood what the distance does).
> >>>>>>>>>>
> >>>>>>>>>> In Mahout, when calling distance() however, if both the
> >> denominator
> >>>>>>> and
> >>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the
> >>>>>>> returned
> >>>>>>>>>> value is 1.
> >>>>>>>>>>
> >>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and
> fix
> >>>>>> it
> >>>>>>>> but
> >>>>>>>>> I
> >>>>>>>>>> want to make sure there's nothing I could possibly be missing.
> >>>>>>>>>>
> >>>>>>>>>> Thoughts?
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Reply via email to