On Fri, Nov 27, 2009 at 11:23 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> Summarize yes. > > But this is, actually, theoretically better because the summarization > introduces useful smoothing. That way you get recommendations for items > even if there is no direct overlap. > Summarize, smooth, and enhance clustering: distances are *not* preserved in truncated decompositions, and the *hope* is that the "meaningful" distances are decreased, and the less meaningful distances are not. This can be seen in a simple example of user preferences (on the netflix scale of 1-5) user1: item1 = 4, item2: 1, item3: 5 user2: item2: 1, item5:1, item7: 3, item8: 3 user3: item4: 4, item5: 1, item6: 5 A first-order recommender won't be able to infer any similarity or dissimilarity between user1 and user 3 (although it can tell some similarity between 1 and 2 and 3 and 2). A decomposing recommender will notice that user1 and 2 both hated item2, and that another item which user2 hated was the same item that user3 hated, and infer transitive similarity, not just to 2nd degree as in this example, but to nth-order. The difference between the various decompositional approaches is how they approximate these transitive similarities - LDA would be best in the very low overlap case, and SVD (or more precisely, a sparse SVD which doesn't treat missing data as the numerical 0 or mean of the values) approaching that level of quality in the bigger data case (but SVD / randomized SVD should be a lot faster than LDA on the big big data case). What I'd really like to see (once I get this decomposer stuff in - soon! We've got good linear primitives now, so I'm working on it!) is also a Restricted Boltzmann Machine based recommender, because this makes the final leap from linear and quasi-linear decompositions to the truly nonlinear case (my friend on the executive team over at Netflix tells me that it was pretty apparent that the winners were going to be blendings of the RBM and SVD-based approaches pretty early on - and he was right!) -jake > Your point about noisy is trenchant because small count data is inherently > noisy because you can't have an exact 0.04 of an observation. Small counts > dominate in recommendations. > > On Fri, Nov 27, 2009 at 10:00 PM, Sean Owen <sro...@gmail.com> wrote: > > > > > Correct me if I'm wrong, but my impression of matrix factorization > > approaches is that they're just a way to effectively "summarize" input > > data. They're not a theoretically better, or even different, approach > > to recommendation, but more a transformation of the input into > > conventional algorithms. (Though this process of simplification could, > > I imagine, sometimes be an improvement on the input, if it's noisy.) > > > > > -- > Ted Dunning, CTO > DeepDyve >