Right, that's fair. So, you're saying there needs to be a special value when both vectors are 0 for the recommender system to work? And that 0 means dislike, which isn't in fact accurate. You want to convey lack of information.
But now, the code returns 1. Is that a special value? I'd guess it means you like it by default...? On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter <ssc.o...@googlemail.com > wrote: > In recommender systems, it's dangerous to interpret "no interaction" as > dislike. Think of all movies you never watched, do you really dislike > them all? :) > > > On 04.04.2013 23:03, Andrew Musselman wrote: > > I agree; I mis-spoke before if I said "dislike". Zero to me means > > literally nothing. No interaction. Which could be either "don't like", > > "don't like today", "dislike", etc. Which adds to the meaninglessness of > > it. > > > > > > On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter > > <ssc.o...@googlemail.com>wrote: > > > >> I think that in our recommender code, 0 should mean no rating or no > >> interaction observed. I think modeling dislike with 0 creates lot of > >> unnecessary problems. > >> > >> On 04.04.2013 22:56, Andrew Musselman wrote: > >>> I see the arguments for having it defined, just raising the point that > >> it's > >>> a very strange spot to be in. > >>> > >>> If all users are zero except for one person who likes the lentil soup, > >> then > >>> the other users are equally different from that person. > >>> > >>> The problem for me is the discontinuity Sean mentions, where at zero > you > >> go > >>> off a cliff and have no sense of distance. > >>> > >>> But for convenience and "behaving nicely" I'm fine with distance > between > >>> zero vectors being zero. > >>> > >>> > >>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon < > dangeorge.fili...@gmail.com > >>> wrote: > >>> > >>>> While I agree that it's fairly meaningless mathematically, this > ensures > >>>> that the distance between two vectors that are the same is 0 always > >> holds. > >>>> Think of yourself using this class through the DistanceMeasure > >> interface. > >>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. > >>>> > >>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) > >>>> > >>>> > >>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < > >>>> andrew.mussel...@gmail.com> wrote: > >>>> > >>>>> I think it should return an "undefined" symbol. There is no angle > >>>> between > >>>>> two zero vectors. > >>>>> > >>>>> In a practical sense, taking two zero vectors to be equivalent in the > >>>>> context of user-item vectors, say, is dodgy in my opinion. That is > >> akin > >>>> to > >>>>> saying "If we both hate everything on this restaurant's menu we are > the > >>>>> same person." > >>>>> > >>>>> > >>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < > >>>> dangeorge.fili...@gmail.com > >>>>>> wrote: > >>>>> > >>>>>> Suneel is right. :) > >>>>>> > >>>>>> Let me explain how this came up: > >>>>>> - When clustering, and assigning a point to a cluster, the centroid > >>>> needs > >>>>>> to be updated. > >>>>>> - To update the centroid in the nearest neighbor searcher classes, > the > >>>>>> centroid must first be removed. > >>>>>> - To remove the centroid, we get the closest vector (search for it, > >> and > >>>>> it > >>>>>> should be itself) and then remove it from the data structures. > >>>>>> => However, when the centroid is 0, the nearest vector (which should > >> be > >>>>>> itself) has a huge distance (1 rather than 0) and this trips a > check. > >>>>>> > >>>>>> > >>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> wrote: > >>>>>> > >>>>>>> It sounds pretty undefined, but I would tend to define the distance > >>>> as > >>>>>>> 0 in this case of course. And that means defining the cosine as 1. > >>>>>>> Which class in particular? There are a few implementations of this > >>>>>>> distance measure. > >>>>>>> > >>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < > >>>>> dangeorge.fili...@gmail.com > >>>>>>> > >>>>>>> wrote: > >>>>>>>> In the case where both vectors are all zeros, the angle between > >>>> them > >>>>> is > >>>>>>> 0, > >>>>>>>> so the cosine is therefore 1 and the so the distance returned > >>>> should > >>>>>> be 0 > >>>>>>>> (unless I misunderstood what the distance does). > >>>>>>>> > >>>>>>>> In Mahout, when calling distance() however, if both the > denominator > >>>>> and > >>>>>>>> dotProduct are 0 (which is true when both vectors are 0), the > >>>>> returned > >>>>>>>> value is 1. > >>>>>>>> > >>>>>>>> This looks like a bug to me and I would open a JIRA issue and fix > >>>> it > >>>>>> but > >>>>>>> I > >>>>>>>> want to make sure there's nothing I could possibly be missing. > >>>>>>>> > >>>>>>>> Thoughts? > >>>>>>> > >>>>>> > >>>>> > >>>> > >>> > >> > >> > > > >