All of this doesn't normally matter when cosine distance is used since usually it is used with normalized vectors. For that set of vectors it is a measure.
On Thu, Apr 4, 2013 at 11:25 PM, Andrew Musselman < andrew.mussel...@gmail.com> wrote: > I agree 1 is wrong :) > > > On Thu, Apr 4, 2013 at 2:22 PM, Dan Filimon <dangeorge.fili...@gmail.com > >wrote: > > > Ah, okay then. :) > > I thought that you depend on the current convention that it returns 1. > So, > > disclaimers aside, you're fine with the change? > > > > > > On Fri, Apr 5, 2013 at 12:20 AM, Sebastian Schelter < > > ssc.o...@googlemail.com > > > wrote: > > > > > You can ignore the recommender stuff for the DistanceMeasure classes, > as > > > the recommenders use their own distance/similarity implementations. > > > > > > I justed wanted to comment on the example that Andrew gave, to mention > > > that there are some common pitfalls with modeling ratings/interactions. > > > > > > On 04.04.2013 23:14, Dan Filimon wrote: > > > > Right, that's fair. So, you're saying there needs to be a special > value > > > > when both vectors are 0 for the recommender system to work? > > > > And that 0 means dislike, which isn't in fact accurate. You want to > > > convey > > > > lack of information. > > > > > > > > But now, the code returns 1. Is that a special value? I'd guess it > > means > > > > you like it by default...? > > > > > > > > > > > > On Fri, Apr 5, 2013 at 12:11 AM, Sebastian Schelter < > > > ssc.o...@googlemail.com > > > >> wrote: > > > > > > > >> In recommender systems, it's dangerous to interpret "no interaction" > > as > > > >> dislike. Think of all movies you never watched, do you really > dislike > > > >> them all? :) > > > >> > > > >> > > > >> On 04.04.2013 23:03, Andrew Musselman wrote: > > > >>> I agree; I mis-spoke before if I said "dislike". Zero to me means > > > >>> literally nothing. No interaction. Which could be either "don't > > > like", > > > >>> "don't like today", "dislike", etc. Which adds to the > > meaninglessness > > > of > > > >>> it. > > > >>> > > > >>> > > > >>> On Thu, Apr 4, 2013 at 2:00 PM, Sebastian Schelter > > > >>> <ssc.o...@googlemail.com>wrote: > > > >>> > > > >>>> I think that in our recommender code, 0 should mean no rating or > no > > > >>>> interaction observed. I think modeling dislike with 0 creates lot > of > > > >>>> unnecessary problems. > > > >>>> > > > >>>> On 04.04.2013 22:56, Andrew Musselman wrote: > > > >>>>> I see the arguments for having it defined, just raising the point > > > that > > > >>>> it's > > > >>>>> a very strange spot to be in. > > > >>>>> > > > >>>>> If all users are zero except for one person who likes the lentil > > > soup, > > > >>>> then > > > >>>>> the other users are equally different from that person. > > > >>>>> > > > >>>>> The problem for me is the discontinuity Sean mentions, where at > > zero > > > >> you > > > >>>> go > > > >>>>> off a cliff and have no sense of distance. > > > >>>>> > > > >>>>> But for convenience and "behaving nicely" I'm fine with distance > > > >> between > > > >>>>> zero vectors being zero. > > > >>>>> > > > >>>>> > > > >>>>> On Thu, Apr 4, 2013 at 1:50 PM, Dan Filimon < > > > >> dangeorge.fili...@gmail.com > > > >>>>> wrote: > > > >>>>> > > > >>>>>> While I agree that it's fairly meaningless mathematically, this > > > >> ensures > > > >>>>>> that the distance between two vectors that are the same is 0 > > always > > > >>>> holds. > > > >>>>>> Think of yourself using this class through the DistanceMeasure > > > >>>> interface. > > > >>>>>> The implicit expectation [1] here is that d(x, y) = 0 iff x = y. > > > >>>>>> > > > >>>>>> [1] http://en.wikipedia.org/wiki/Metric_(mathematics) > > > >>>>>> > > > >>>>>> > > > >>>>>> On Thu, Apr 4, 2013 at 11:40 PM, Andrew Musselman < > > > >>>>>> andrew.mussel...@gmail.com> wrote: > > > >>>>>> > > > >>>>>>> I think it should return an "undefined" symbol. There is no > > angle > > > >>>>>> between > > > >>>>>>> two zero vectors. > > > >>>>>>> > > > >>>>>>> In a practical sense, taking two zero vectors to be equivalent > in > > > the > > > >>>>>>> context of user-item vectors, say, is dodgy in my opinion. > That > > is > > > >>>> akin > > > >>>>>> to > > > >>>>>>> saying "If we both hate everything on this restaurant's menu we > > are > > > >> the > > > >>>>>>> same person." > > > >>>>>>> > > > >>>>>>> > > > >>>>>>> On Thu, Apr 4, 2013 at 11:56 AM, Dan Filimon < > > > >>>>>> dangeorge.fili...@gmail.com > > > >>>>>>>> wrote: > > > >>>>>>> > > > >>>>>>>> Suneel is right. :) > > > >>>>>>>> > > > >>>>>>>> Let me explain how this came up: > > > >>>>>>>> - When clustering, and assigning a point to a cluster, the > > > centroid > > > >>>>>> needs > > > >>>>>>>> to be updated. > > > >>>>>>>> - To update the centroid in the nearest neighbor searcher > > classes, > > > >> the > > > >>>>>>>> centroid must first be removed. > > > >>>>>>>> - To remove the centroid, we get the closest vector (search > for > > > it, > > > >>>> and > > > >>>>>>> it > > > >>>>>>>> should be itself) and then remove it from the data structures. > > > >>>>>>>> => However, when the centroid is 0, the nearest vector (which > > > should > > > >>>> be > > > >>>>>>>> itself) has a huge distance (1 rather than 0) and this trips a > > > >> check. > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> On Thu, Apr 4, 2013 at 9:46 PM, Sean Owen <sro...@gmail.com> > > > wrote: > > > >>>>>>>> > > > >>>>>>>>> It sounds pretty undefined, but I would tend to define the > > > distance > > > >>>>>> as > > > >>>>>>>>> 0 in this case of course. And that means defining the cosine > as > > > 1. > > > >>>>>>>>> Which class in particular? There are a few implementations of > > > this > > > >>>>>>>>> distance measure. > > > >>>>>>>>> > > > >>>>>>>>> On Thu, Apr 4, 2013 at 7:42 PM, Dan Filimon < > > > >>>>>>> dangeorge.fili...@gmail.com > > > >>>>>>>>> > > > >>>>>>>>> wrote: > > > >>>>>>>>>> In the case where both vectors are all zeros, the angle > > between > > > >>>>>> them > > > >>>>>>> is > > > >>>>>>>>> 0, > > > >>>>>>>>>> so the cosine is therefore 1 and the so the distance > returned > > > >>>>>> should > > > >>>>>>>> be 0 > > > >>>>>>>>>> (unless I misunderstood what the distance does). > > > >>>>>>>>>> > > > >>>>>>>>>> In Mahout, when calling distance() however, if both the > > > >> denominator > > > >>>>>>> and > > > >>>>>>>>>> dotProduct are 0 (which is true when both vectors are 0), > the > > > >>>>>>> returned > > > >>>>>>>>>> value is 1. > > > >>>>>>>>>> > > > >>>>>>>>>> This looks like a bug to me and I would open a JIRA issue > and > > > fix > > > >>>>>> it > > > >>>>>>>> but > > > >>>>>>>>> I > > > >>>>>>>>>> want to make sure there's nothing I could possibly be > missing. > > > >>>>>>>>>> > > > >>>>>>>>>> Thoughts? > > > >>>>>>>>> > > > >>>>>>>> > > > >>>>>>> > > > >>>>>> > > > >>>>> > > > >>>> > > > >>>> > > > >>> > > > >> > > > >> > > > > > > > > > > > > >