Re: [Taste] Sanity Check and Questions

Sean Owen Thu, 20 Aug 2009 09:18:27 -0700

On Thu, Aug 20, 2009 at 4:55 PM, Mark Desnoyer<[email protected]> wrote:
> What about defining a small prior similarity value between all items? In
> this case, the Lincoln book and the cookbook would start with some small
> similarity like 0.01 and as more users connect the books, this value gets
> swamped with the true value. It's the same concept that's used in Beta or
> Dirichlet distributions.


The core of what you're suggesting, I think, is that the similarity
value increases as the number of users that are connected to both
items increases? And then it doesn't even depend on the rating values.
Yes, actually I think this works well (and Ted would agree, I
believe.) Or, you could say that indeed this attacks the very problem
highlighted by this scenario, that the stock algorithm takes no
account of this number.


> Anyway, in the case of this algorithm, if there is no user data between
> Lincoln books and the cookbook, then the resulting preference would just be
> the average of all the user's previous ratings. If there is some week

That's a variant, to fall back on the average of the user's ratings. I
personally don't like it -- would rather just disqualify the items
from recommendations. But it's most certainly plausible. Assuming the
top recommendations have values that are far from 'average', and
that's reasonable, it won't matter whether you reject these items or
give them a middling score, which almost certainly puts them out of
the top recommendations.


> similarity, say 0.1 with a rating of 5, then you'd skew the resulting
> preference score higher, but it won't go all the way to 5.0. How much it
> skews is controlled by the strength of prior relative to the similarity from
> the data.

I think you're sort of suggesting to not normalize for the weights in
the weighted average? right now indeed it sort of does this --
multiplies the 5 by 0.1. But in the end it divides through by the 0.1.
You could not do that; then the results aren't really estimated
preferences since they won't necessarily map into the 1-5 range in
this example, for instance.

But then sure you just map the results into this range. Yeah, this
sort of transform is what I stuck onto the Pearson correlation to
account for a similar phenomenon. I'll look at adapting that, which is
I think roughly what you are describing.

Re: [Taste] Sanity Check and Questions

Reply via email to