If you are going with this approach, I think that the prior needs
considerably more weight (as in the equivalent of several ratings at below
average recommendation).  That way, slight correlations will result in
average or below average recommendation level.

One of the worst things a recommendation engine can do is make ludicrous
recommendations.  That turns off users and it turns off decision makers even
faster.

On Fri, Aug 21, 2009 at 8:11 AM, Mark Desnoyer <[email protected]> wrote:

> Well mathematically, right now, by omitting the unknown similarities, you
> are equivalently setting their similarities to zero and thus they drop off.
> Unless there's some other calculation I don't know about that....
>
> If you want a more theoretically sound version of what I'm trying to do
> (instead of my hacked up way), see section 2.2.1 in:
>
> http://research.microsoft.com/pubs/69656/tr-98-12.pdf
>
> And yes, if you have a default similarity, you will change the result for
> different entries. In the above example say there is another book, maybe
> about fishing that we also want to estimate the preference for. Say this
> book has a similarity of 0.1 with the Lincoln book and 0.1 with the france
> book. Without the prior, your estimate is:
>
> score(fishing) = (0.1*5 + 0.1*3) / (0.1 + 0.1) = 4
>
> With the prior, it's:
>
> score(fishing) = (0.1*5 + 0.1*3 + 0.01*1) / (0.1 + 0.1 +0.01) = 3.85
>
> This is 96% of the original value, whereas for the cookbook example, it's
> 90%. They aren't moving in lockstep because the impact of the prior is
> different depending on which entries we have real data for.
>
> -Mark
>
> On Fri, Aug 21, 2009 at 5:02 AM, Sean Owen <[email protected]> wrote:
>
> > The only piece of what you're saying that sort of doesn't click with
> > me is assuming a prior of 0, or some small value. Similarities range
> > from -1 to 1, so a value of 0 is a positive statement that there is
> > exactly no relationship between the ratings for both items. This is
> > different than having no opinion on it and it matters to the
> > subsequent calculations.
> >
> > But does your second suggest really meaningfully change the result...
> > yeah I push the rating down from 5 to 4.5, but throwing in these other
> > small terms in the average does roughly the same to all the values? I
> > perhaps haven't thought this through. I understand the intuitions here
> > and think they are right.
> >
> >
> > One meta-issue here for the library is this: there's a 'standard'
> > item-based recommender algorithm out there, and we want to have that.
> > And we do. So I don't want to touch it -- perhaps add some options to
> > modify its behavior. So we're talking about maybe inventing a variant
> > algorithm... or three or four. That's good I guess, though not exactly
> > the remit I had in mind for the CF part. I was imagining it would
> > provide access to canonical approaches with perhaps some small
> > variants tacked on, or at least hooks to modify parts of the logic.
> >
> > Basically I also need to have a think about how to include variants
> > like this in a logical way.
> >
> >
> > On Thu, Aug 20, 2009 at 6:07 PM, Mark Desnoyer<[email protected]>
> wrote:
> > > You could do it that way, but I don't think you're restricted to
> ignoring
> > > the rating values. For example, you could define similarity between
> item
> > i
> > > and item j like (the normalization is probably incomplete, but this is
> > the
> > > idea):
> > >
> > > similarity(i,j) = (prior + sum({x_ij})) / (count({x_ij}) + 1)
> > >
> > > where each x_ij is the similarity defined by a single user and could be
> > > based on their ratings. So I think the way you're thinking of it, x_ij
> =
> > 1,
> > > but it could be a function of the ratings, say higher if the ratings
> are
> > > closer and lower if they are far apart.
> > >
> > > You can still do the weighted average, you just have more items to
> > > calculate. Say a user has rated the Liconln book 5 a book on france 3
> and
> > a
> > > book on space travel 1. Assuming there is no data linking the france or
> > > space books to the cookbook, then their similarities would be the
> prior,
> > or
> > > 0.01. Then, you'd calculate the score for the cookbook recommendation
> as:
> > >
> > > score(cookbook) = 5*0.1 + 3*0.01 + 1*0.01 / (0.1 + 0.01 + 0.01) =  4.5
> >
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to