On Tue, Jan 26, 2010 at 3:36 PM, Ted Dunning <ted.dunn...@gmail.com> wrote:
> I define it a bit differently by redefining recommendations as machine > learning. > > On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sro...@gmail.com> wrote: > > > I would narrow and specify this, in the context of Mahout, to have a > > collaborative filtering angle: > Since Ted (Mr. Machine Learning) wants to describe content-based recommendations as machine learning, and Sean (Mr. Taste/CF) goes and describes it it terms of collaborative filtering, I suppose I'll put on my "search guy" hat, and describe it the way I see it: Items have attributes (e.g. text features), and users express preference for some attributes (e.g. explicit entering of text keywords), and the recommender (a.k.a. search engine) returns a ranked list of items which take those preferences and find the best items which have some of those preferences. Generalizing a bit beyond that example, users may not make explicit mention of certain attributes, but we may infer them from some other source (a user on a social network may have a profile, a member of a dating website may have answered a questionnaire expressing some preferences, etc.) and use these to generate a "query" against the recommender. There is no need (although there may be much *utility*) in ever thinking about interactions between items (item-item similarity) or users. Content-based recommendations can act purely as a generalized search engine, where the trick is just coming up with the search terms / query features to use for each user. An advantage of thinking of it this way means that you don't need to think about "users" at all: you can have recommendations of items of type A against items of type B: * on webpage (type W), you have certain set of features, and users come to that webpage, sometimes with no prior history, so if you want to recommend (serve) ads (type A) to the user, recommending based purely on some kind of content-based correlation between items of type W and A can work. * on a job board, recruiters can post job listings (type J), and you want to recommend possible resumes (type R) to the job (*not* to the recruiter, because the recruiter has distinctly different "preferences" for each job - the *job* is the thing which wants recommendations). In both of these cases, you can do a full-fledged recommendation engine with no users whatsoever, with content and item information across multiple domains. The other advantage of thinking of content-based recommender systems this way is that now you have an entirely new axis to think about: CF goes one way, and content-based "searching" goes another, and there is an entire spectrum of "fusion" models which mix the two. (of course, this leaves out one further piece of information which is similar to CF, but deserves its own treatment: explicit link information, available in the form of web-graph links, or social network links - recommenders based on this information can look a lot like CF, but it's using *explicit* user-user or item-item correlations instead of based implicitly due to co-occurrence / usage). -jake