On Thu, Aug 20, 2009 at 4:55 PM, Mark Desnoyer<[email protected]> wrote: > What about defining a small prior similarity value between all items? In > this case, the Lincoln book and the cookbook would start with some small > similarity like 0.01 and as more users connect the books, this value gets > swamped with the true value. It's the same concept that's used in Beta or > Dirichlet distributions.
The core of what you're suggesting, I think, is that the similarity value increases as the number of users that are connected to both items increases? And then it doesn't even depend on the rating values. Yes, actually I think this works well (and Ted would agree, I believe.) Or, you could say that indeed this attacks the very problem highlighted by this scenario, that the stock algorithm takes no account of this number. > Anyway, in the case of this algorithm, if there is no user data between > Lincoln books and the cookbook, then the resulting preference would just be > the average of all the user's previous ratings. If there is some week That's a variant, to fall back on the average of the user's ratings. I personally don't like it -- would rather just disqualify the items from recommendations. But it's most certainly plausible. Assuming the top recommendations have values that are far from 'average', and that's reasonable, it won't matter whether you reject these items or give them a middling score, which almost certainly puts them out of the top recommendations. > similarity, say 0.1 with a rating of 5, then you'd skew the resulting > preference score higher, but it won't go all the way to 5.0. How much it > skews is controlled by the strength of prior relative to the similarity from > the data. I think you're sort of suggesting to not normalize for the weights in the weighted average? right now indeed it sort of does this -- multiplies the 5 by 0.1. But in the end it divides through by the 0.1. You could not do that; then the results aren't really estimated preferences since they won't necessarily map into the 1-5 range in this example, for instance. But then sure you just map the results into this range. Yeah, this sort of transform is what I stuck onto the Pearson correlation to account for a similar phenomenon. I'll look at adapting that, which is I think roughly what you are describing.
