I define it a bit differently by redefining recommendations as machine
learning.

Users have preferences for objects with attributes.

We would like to learn from all user/object/attribute preference data to
predict so-far unobserved preferences of a user for other objects.

Normal recommendations is a subset of this where there is exactly one id
attribute for every object.

We can extend most recommendation algorithms to this new paradigm relatively
transparently by considering each expressed item preference to be a bundle
of attribute preferences.  Our recommendation algorithm needs to produce a
list of recommended attributes which we integrate into a list of recommended
items.  The list of recommended attributes might be segregated into a list
of values for each kind of attribute or it might be in a single list.  The
segregated approach could just replicate a recommendation engine per
attribute type.  The combined approach might just label all attributes and
throw them into a soup of preference data.

The additional code needed consists mostly of writing the code that
integrates the attribute recommendations into a list of item
recommendations.  This can be as simple as weighting the recommended
attributes by rank and doing rankScore * idf retrieval to find the items.
Some algorithms like LDA have the ability to explicitly integrate the
different kinds of attributes.  Others really don't.

One problem with this is that you are exploding the number of preferences
which can present scaling and noise problems.  You also inherently
intermingle attributes with very different distributional characteristics
together.  For instance, there might only be a dozen or so colors of shoes
and thus the number of people who have expressed a preference for some kind
of red shoe is going to be vastly larger than the number of people who have
expressed a preference for a specific color of a specific size of a specific
model of a shoe.  It is common for recommendation systems to fail for very
common things or for very rare things and integrating both pathological
situations in a single recommendation framework may be a problem.

My own experience with this is that it is common for one kind of attribute
to dominates the recommendation process in the sense of providing the most
oomph and accuracy.  This can be because the data is sparse and some
attribute provide useful smoothing or it can be that some attributes are too
general and other attributes provide more precision.  At Musicmatch, for
instance, the artist attribute provided a disproportionate share of music
recommendation value above track or album or even song (track != song
because it is common for the same song to be on many albums giving many
tracks).  I think that this must only be true to first order and that if you
dig in, you would find minority classes where different attributes provide
different amounts of data, but it is rare in startups to get past the first
order solution.


On Tue, Jan 26, 2010 at 1:44 PM, Sean Owen <sro...@gmail.com> wrote:

> I want to knock down some support for content based recommendation.
> And I want to solicit ideas about what this even means to its intended
> audience -- users.
>
> I define it broadly as a recommender in which:
> - items have attributes (e.g. books have genres, titles, authors)
> rather than being completely opaque entities
> - users have affinities for attributes
> - users are recommended items with attributes they like
>
> I would narrow and specify this, in the context of Mahout, to have a
> collaborative filtering angle:
> - items have attributes, still
> - users have preferences for items (classic CF)
> - (therefore, users implicitly have affinities for attributes)
> - item similarity can be defined in terms of item attributes, in some way
> - users are recommended items that are similar to other items they
> like (item-based recommendation)
> - (therefore, users are recommended items with attributes they like)
>
> This is my spin on content based recommendation in Mahout. I define it
> as a special case of item-based recommendation. Thoughts?
>
> So, the idea is to provide some non-trivial framework for supporting
> item attributes, and defining similarity in terms of attributes.
> Thoughts on what that should look like?
>
> Sean
>



-- 
Ted Dunning, CTO
DeepDyve

Reply via email to