Ah, okay then. :) I thought that you depend on the current convention that it returns 1. So, disclaimers aside, you're fine with the change?
On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter <ssc.o...@googlemail.com > wrote: > You can ignore the recommender stuff for the DistanceMeasure classes, as > the recommenders use their own distance/similarity implementations. > > I justed wanted to comment on the example that Andrew gave, to mention > that there are some common pitfalls with modeling ratings/interactions. > > On 04.04.2013 23:14, Dan Filimon wrote: > > Right, that's fair. So, you're saying there needs to be a special value > > when both vectors are 0 for the recommender system to work? > > And that 0 means dislike, which isn't in fact accurate. You want to > convey > > lack of information. > > > > But now, the code returns 1. Is that a special value? I'd guess it means > > you like it by default...? > > > > > > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter < > ssc.o...@googlemail.com > >> wrote: > > > >> In recommender systems, it's dangerous to interpret "no interaction" as > >> dislike. Think of all movies you never watched, do you really dislike > >> them all? :) > >> > >> > >> On 04.04.2013 23:03, Andrew Musselman wrote: > >>> I agree; I mis-spoke before if I said "dislike". Zero to me means > >>> literally nothing. No interaction. Which could be either "don't > like", > >>> "don't like today", "dislike", etc. Which adds to the meaninglessness > of > >>> it. > >>> > >>> > >>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter > >>> <ssc.o...@googlemail.com>wrote: > >>> > >>>> I think that in our recommender code, 0 should mean no rating or no > >>>> interaction observed. I think modeling dislike with 0 creates lot of > >>>> unnecessary problems. > >>>> > >>>> On 04.04.2013 22:56, Andrew Musselman wrote: > >>>>> I see the arguments for having it defined, just raising the point > that > >>>> it's > >>>>> a very strange spot to be in. > >>>>> > >>>>> If all users are zero except for one person who likes the lentil > soup, > >>>> then > >>>>> the other users are equally different from that person. > >>>>> > >>>>> The problem for me is the discontinuity Sean mentions, where at zero > >> you > >>>> go > >>>>> off a cliff and have no sense of distance. > >>>>> > >>>>> But for convenience and "behaving nicely" I'm fine with distance > >> between > >>>>> zero vectors being zero. > >>>>> > >>>>> > >>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon < > >> dangeorge.fili...@gmail.com > >>>>> wrote: > >>>>> > >>>>>> While I agree that it's fairly meaningless mathematically, this > >> ensures > >>>>>> that the distance between two vectors that are the same is 0 always > >>>> holds. > >>>>>> Think of yourself using this class through the DistanceMeasure > >>>> interface. > >>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. > >>>>>> > >>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) > >>>>>> > >>>>>> > >>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < > >>>>>> andrew.mussel...@gmail.com> wrote: > >>>>>> > >>>>>>> I think it should return an "undefined" symbol. There is no angle > >>>>>> between > >>>>>>> two zero vectors. > >>>>>>> > >>>>>>> In a practical sense, taking two zero vectors to be equivalent in > the > >>>>>>> context of user-item vectors, say, is dodgy in my opinion. That is > >>>> akin > >>>>>> to > >>>>>>> saying "If we both hate everything on this restaurant's menu we are > >> the > >>>>>>> same person." > >>>>>>> > >>>>>>> > >>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < > >>>>>> dangeorge.fili...@gmail.com > >>>>>>>> wrote: > >>>>>>> > >>>>>>>> Suneel is right. :) > >>>>>>>> > >>>>>>>> Let me explain how this came up: > >>>>>>>> - When clustering, and assigning a point to a cluster, the > centroid > >>>>>> needs > >>>>>>>> to be updated. > >>>>>>>> - To update the centroid in the nearest neighbor searcher classes, > >> the > >>>>>>>> centroid must first be removed. > >>>>>>>> - To remove the centroid, we get the closest vector (search for > it, > >>>> and > >>>>>>> it > >>>>>>>> should be itself) and then remove it from the data structures. > >>>>>>>> => However, when the centroid is 0, the nearest vector (which > should > >>>> be > >>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a > >> check. > >>>>>>>> > >>>>>>>> > >>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> > wrote: > >>>>>>>> > >>>>>>>>> It sounds pretty undefined, but I would tend to define the > distance > >>>>>> as > >>>>>>>>> 0 in this case of course. And that means defining the cosine as > 1. > >>>>>>>>> Which class in particular? There are a few implementations of > this > >>>>>>>>> distance measure. > >>>>>>>>> > >>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < > >>>>>>> dangeorge.fili...@gmail.com > >>>>>>>>> > >>>>>>>>> wrote: > >>>>>>>>>> In the case where both vectors are all zeros, the angle between > >>>>>> them > >>>>>>> is > >>>>>>>>> 0, > >>>>>>>>>> so the cosine is therefore 1 and the so the distance returned > >>>>>> should > >>>>>>>> be 0 > >>>>>>>>>> (unless I misunderstood what the distance does). > >>>>>>>>>> > >>>>>>>>>> In Mahout, when calling distance() however, if both the > >> denominator > >>>>>>> and > >>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the > >>>>>>> returned > >>>>>>>>>> value is 1. > >>>>>>>>>> > >>>>>>>>>> This looks like a bug to me and I would open a JIRA issue and > fix > >>>>>> it > >>>>>>>> but > >>>>>>>>> I > >>>>>>>>>> want to make sure there's nothing I could possibly be missing. > >>>>>>>>>> > >>>>>>>>>> Thoughts? > >>>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> > >> > >> > > > >