Right, that's fair. So, you're saying there needs to be a special value
when both vectors are 0 for the recommender system to work?
And that 0 means dislike, which isn't in fact accurate. You want to convey
lack of information.

But now, the code returns 1. Is that a special value? I'd guess it means
you like it by default...?


On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <ssc.o...@googlemail.com
> wrote:

> In recommender systems, it's dangerous to interpret "no interaction" as
> dislike. Think of all movies you never watched, do you really dislike
> them all? :)
>
>
> On 04.04.2013 23:03, Andrew Musselman wrote:
> > I agree; I mis-spoke before if I said "dislike".  Zero to me means
> > literally nothing.  No interaction.  Which could be either "don't like",
> > "don't like today", "dislike", etc.  Which adds to the meaninglessness of
> > it.
> >
> >
> > On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter
> > <ssc.o...@googlemail.com>wrote:
> >
> >> I think that in our recommender code, 0 should mean no rating or no
> >> interaction observed. I think modeling dislike with 0 creates lot of
> >> unnecessary problems.
> >>
> >> On 04.04.2013 22:56, Andrew Musselman wrote:
> >>> I see the arguments for having it defined, just raising the point that
> >> it's
> >>> a very strange spot to be in.
> >>>
> >>> If all users are zero except for one person who likes the lentil soup,
> >> then
> >>> the other users are equally different from that person.
> >>>
> >>> The problem for me is the discontinuity Sean mentions, where at zero
> you
> >> go
> >>> off a cliff and have no sense of distance.
> >>>
> >>> But for convenience and "behaving nicely" I'm fine with distance
> between
> >>> zero vectors being zero.
> >>>
> >>>
> >>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <
> dangeorge.fili...@gmail.com
> >>> wrote:
> >>>
> >>>> While I agree that it's fairly meaningless mathematically, this
> ensures
> >>>> that the distance between two vectors that are the same is 0 always
> >> holds.
> >>>> Think of yourself using this class through the DistanceMeasure
> >> interface.
> >>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
> >>>>
> >>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
> >>>>
> >>>>
> >>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
> >>>> andrew.mussel...@gmail.com> wrote:
> >>>>
> >>>>> I think it should return an "undefined" symbol.  There is no angle
> >>>> between
> >>>>> two zero vectors.
> >>>>>
> >>>>> In a practical sense, taking two zero vectors to be equivalent in the
> >>>>> context of user-item vectors, say, is dodgy in my opinion.  That is
> >> akin
> >>>> to
> >>>>> saying "If we both hate everything on this restaurant's menu we are
> the
> >>>>> same person."
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
> >>>> dangeorge.fili...@gmail.com
> >>>>>> wrote:
> >>>>>
> >>>>>> Suneel is right. :)
> >>>>>>
> >>>>>> Let me explain how this came up:
> >>>>>> - When clustering, and assigning a point to a cluster, the centroid
> >>>> needs
> >>>>>> to be updated.
> >>>>>> - To update the centroid in the nearest neighbor searcher classes,
> the
> >>>>>> centroid must first be removed.
> >>>>>> - To remove the centroid, we get the closest vector (search for it,
> >> and
> >>>>> it
> >>>>>> should be itself) and then remove it from the data structures.
> >>>>>> => However, when the centroid is 0, the nearest vector (which should
> >> be
> >>>>>> itself) has a huge distance (1 rather than 0) and this trips a
> check.
> >>>>>>
> >>>>>>
> >>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote:
> >>>>>>
> >>>>>>> It sounds pretty undefined, but I would tend to define the distance
> >>>> as
> >>>>>>> 0 in this case of course. And that means defining the cosine as 1.
> >>>>>>> Which class in particular? There are a few implementations of this
> >>>>>>> distance measure.
> >>>>>>>
> >>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
> >>>>> dangeorge.fili...@gmail.com
> >>>>>>>
> >>>>>>> wrote:
> >>>>>>>> In the case where both vectors are all zeros, the angle between
> >>>> them
> >>>>> is
> >>>>>>> 0,
> >>>>>>>> so the cosine is therefore 1 and the so the distance returned
> >>>> should
> >>>>>> be 0
> >>>>>>>> (unless I misunderstood what the distance does).
> >>>>>>>>
> >>>>>>>> In Mahout, when calling distance() however, if both the
> denominator
> >>>>> and
> >>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the
> >>>>> returned
> >>>>>>>> value is 1.
> >>>>>>>>
> >>>>>>>> This looks like a bug to me and I would open a JIRA issue and fix
> >>>> it
> >>>>>> but
> >>>>>>> I
> >>>>>>>> want to make sure there's nothing I could possibly be missing.
> >>>>>>>>
> >>>>>>>> Thoughts?
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >>
> >
>
>

Reply via email to