I define it a bit differently by redefining recommendations as machine learning.
Users have preferences for objects with attributes. We would like to learn from all user/object/attribute preference data to predict so-far unobserved preferences of a user for other objects. Normal recommendations is a subset of this where there is exactly one id attribute for every object. We can extend most recommendation algorithms to this new paradigm relatively transparently by considering each expressed item preference to be a bundle of attribute preferences. Our recommendation algorithm needs to produce a list of recommended attributes which we integrate into a list of recommended items. The list of recommended attributes might be segregated into a list of values for each kind of attribute or it might be in a single list. The segregated approach could just replicate a recommendation engine per attribute type. The combined approach might just label all attributes and throw them into a soup of preference data. The additional code needed consists mostly of writing the code that integrates the attribute recommendations into a list of item recommendations. This can be as simple as weighting the recommended attributes by rank and doing rankScore * idf retrieval to find the items. Some algorithms like LDA have the ability to explicitly integrate the different kinds of attributes. Others really don't. One problem with this is that you are exploding the number of preferences which can present scaling and noise problems. You also inherently intermingle attributes with very different distributional characteristics together. For instance, there might only be a dozen or so colors of shoes and thus the number of people who have expressed a preference for some kind of red shoe is going to be vastly larger than the number of people who have expressed a preference for a specific color of a specific size of a specific model of a shoe. It is common for recommendation systems to fail for very common things or for very rare things and integrating both pathological situations in a single recommendation framework may be a problem. My own experience with this is that it is common for one kind of attribute to dominates the recommendation process in the sense of providing the most oomph and accuracy. This can be because the data is sparse and some attribute provide useful smoothing or it can be that some attributes are too general and other attributes provide more precision. At Musicmatch, for instance, the artist attribute provided a disproportionate share of music recommendation value above track or album or even song (track != song because it is common for the same song to be on many albums giving many tracks). I think that this must only be true to first order and that if you dig in, you would find minority classes where different attributes provide different amounts of data, but it is rare in startups to get past the first order solution. On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sro...@gmail.com> wrote: > I want to knock down some support for content based recommendation. > And I want to solicit ideas about what this even means to its intended > audience -- users. > > I define it broadly as a recommender in which: > - items have attributes (e.g. books have genres, titles, authors) > rather than being completely opaque entities > - users have affinities for attributes > - users are recommended items with attributes they like > > I would narrow and specify this, in the context of Mahout, to have a > collaborative filtering angle: > - items have attributes, still > - users have preferences for items (classic CF) > - (therefore, users implicitly have affinities for attributes) > - item similarity can be defined in terms of item attributes, in some way > - users are recommended items that are similar to other items they > like (item-based recommendation) > - (therefore, users are recommended items with attributes they like) > > This is my spin on content based recommendation in Mahout. I define it > as a special case of item-based recommendation. Thoughts? > > So, the idea is to provide some non-trivial framework for supporting > item attributes, and defining similarity in terms of attributes. > Thoughts on what that should look like? > > Sean > -- Ted Dunning, CTO DeepDyve