I agree 1 is wrong :)

On Thu, Apr 4, 2013 at 2:22 PM, Dan Filimon <dangeorge.fili...@gmail.com>wrote:

> Ah, okay then. :)
> I thought that you depend on the current convention that it returns 1. So,
> disclaimers aside, you're fine with the change?
>
>
> On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter <
> ssc.o...@googlemail.com
> > wrote:
>
> > You can ignore the recommender stuff for the DistanceMeasure classes, as
> > the recommenders use their own distance/similarity implementations.
> >
> > I justed wanted to comment on the example that Andrew gave, to mention
> > that there are some common pitfalls with modeling ratings/interactions.
> >
> > On 04.04.2013 23:14, Dan Filimon wrote:
> > > Right, that's fair. So, you're saying there needs to be a special value
> > > when both vectors are 0 for the recommender system to work?
> > > And that 0 means dislike, which isn't in fact accurate. You want to
> > convey
> > > lack of information.
> > >
> > > But now, the code returns 1. Is that a special value? I'd guess it
> means
> > > you like it by default...?
> > >
> > >
> > > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <
> > ssc.o...@googlemail.com
> > >> wrote:
> > >
> > >> In recommender systems, it's dangerous to interpret "no interaction"
> as
> > >> dislike. Think of all movies you never watched, do you really dislike
> > >> them all? :)
> > >>
> > >>
> > >> On 04.04.2013 23:03, Andrew Musselman wrote:
> > >>> I agree; I mis-spoke before if I said "dislike".  Zero to me means
> > >>> literally nothing.  No interaction.  Which could be either "don't
> > like",
> > >>> "don't like today", "dislike", etc.  Which adds to the
> meaninglessness
> > of
> > >>> it.
> > >>>
> > >>>
> > >>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter
> > >>> <ssc.o...@googlemail.com>wrote:
> > >>>
> > >>>> I think that in our recommender code, 0 should mean no rating or no
> > >>>> interaction observed. I think modeling dislike with 0 creates lot of
> > >>>> unnecessary problems.
> > >>>>
> > >>>> On 04.04.2013 22:56, Andrew Musselman wrote:
> > >>>>> I see the arguments for having it defined, just raising the point
> > that
> > >>>> it's
> > >>>>> a very strange spot to be in.
> > >>>>>
> > >>>>> If all users are zero except for one person who likes the lentil
> > soup,
> > >>>> then
> > >>>>> the other users are equally different from that person.
> > >>>>>
> > >>>>> The problem for me is the discontinuity Sean mentions, where at
> zero
> > >> you
> > >>>> go
> > >>>>> off a cliff and have no sense of distance.
> > >>>>>
> > >>>>> But for convenience and "behaving nicely" I'm fine with distance
> > >> between
> > >>>>> zero vectors being zero.
> > >>>>>
> > >>>>>
> > >>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon <
> > >> dangeorge.fili...@gmail.com
> > >>>>> wrote:
> > >>>>>
> > >>>>>> While I agree that it's fairly meaningless mathematically, this
> > >> ensures
> > >>>>>> that the distance between two vectors that are the same is 0
> always
> > >>>> holds.
> > >>>>>> Think of yourself using this class through the DistanceMeasure
> > >>>> interface.
> > >>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y.
> > >>>>>>
> > >>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics)
> > >>>>>>
> > >>>>>>
> > >>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman <
> > >>>>>> andrew.mussel...@gmail.com> wrote:
> > >>>>>>
> > >>>>>>> I think it should return an "undefined" symbol.  There is no
> angle
> > >>>>>> between
> > >>>>>>> two zero vectors.
> > >>>>>>>
> > >>>>>>> In a practical sense, taking two zero vectors to be equivalent in
> > the
> > >>>>>>> context of user-item vectors, say, is dodgy in my opinion.  That
> is
> > >>>> akin
> > >>>>>> to
> > >>>>>>> saying "If we both hate everything on this restaurant's menu we
> are
> > >> the
> > >>>>>>> same person."
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon <
> > >>>>>> dangeorge.fili...@gmail.com
> > >>>>>>>> wrote:
> > >>>>>>>
> > >>>>>>>> Suneel is right. :)
> > >>>>>>>>
> > >>>>>>>> Let me explain how this came up:
> > >>>>>>>> - When clustering, and assigning a point to a cluster, the
> > centroid
> > >>>>>> needs
> > >>>>>>>> to be updated.
> > >>>>>>>> - To update the centroid in the nearest neighbor searcher
> classes,
> > >> the
> > >>>>>>>> centroid must first be removed.
> > >>>>>>>> - To remove the centroid, we get the closest vector (search for
> > it,
> > >>>> and
> > >>>>>>> it
> > >>>>>>>> should be itself) and then remove it from the data structures.
> > >>>>>>>> => However, when the centroid is 0, the nearest vector (which
> > should
> > >>>> be
> > >>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a
> > >> check.
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com>
> > wrote:
> > >>>>>>>>
> > >>>>>>>>> It sounds pretty undefined, but I would tend to define the
> > distance
> > >>>>>> as
> > >>>>>>>>> 0 in this case of course. And that means defining the cosine as
> > 1.
> > >>>>>>>>> Which class in particular? There are a few implementations of
> > this
> > >>>>>>>>> distance measure.
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon <
> > >>>>>>> dangeorge.fili...@gmail.com
> > >>>>>>>>>
> > >>>>>>>>> wrote:
> > >>>>>>>>>> In the case where both vectors are all zeros, the angle
> between
> > >>>>>> them
> > >>>>>>> is
> > >>>>>>>>> 0,
> > >>>>>>>>>> so the cosine is therefore 1 and the so the distance returned
> > >>>>>> should
> > >>>>>>>> be 0
> > >>>>>>>>>> (unless I misunderstood what the distance does).
> > >>>>>>>>>>
> > >>>>>>>>>> In Mahout, when calling distance() however, if both the
> > >> denominator
> > >>>>>>> and
> > >>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the
> > >>>>>>> returned
> > >>>>>>>>>> value is 1.
> > >>>>>>>>>>
> > >>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and
> > fix
> > >>>>>> it
> > >>>>>>>> but
> > >>>>>>>>> I
> > >>>>>>>>>> want to make sure there's nothing I could possibly be missing.
> > >>>>>>>>>>
> > >>>>>>>>>> Thoughts?
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>>
> > >>>
> > >>
> > >>
> > >
> >
> >
>

Reply via email to