Re: Recommending when working with binary data sets

Otis Gospodnetic Wed, 01 Oct 2008 07:51:09 -0700

Hi Sean,

----- Original Message ----
> From: Sean Owen <[EMAIL PROTECTED]>
> To: [email protected]
> Sent: Wednesday, October 1, 2008 7:18:08 AM
> Subject: Re: Recommending when working with binary data sets
> 
> On Tue, Sep 30, 2008 at 7:55 PM, Otis Gospodnetic
> wrote:
> > I think their algo then translates to:
> >
> > for item i1 in {1, 2, 3 ...100} {
> >  for item i2 in {1, 10, 20} {
> >    computeSimilarity(i1, i2)
> >  }
> > }
> 
> Yes though you would not consider recommending items 1, 10, or 20
> since the user already has expressed a preference for them.

Right.

> > sim (1, 1) = 1.0
> 
> (... and you therefore wouldn't have to compute this)
> 
> > sim (10, 1) = same as sim (1,10)
> 
> (... sim(x,x) isn't recorded in the similarity matrix, and sim(x,y) is
> only recorded for x < y to avoid this redundancy, because we assume
> it's symmetric. You know all this, just commenting.)

Right.

> > So, then we have a list of most similarity items for each item:
> > item 1: 20, 10
> > item 10: 20, 1
> > item 20: 10, 1
> 
> Yes those are most similar but of course they are, since those are the
> very items the user has already expressed a preference for!

Right, right, but isn't that what 
http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/2003_Archives/0301/d&file=wp1lind.xml&xsl=article.xsl
 describes?

If I understand the article correctly, they go through several possible 
approaches to building REs - user-item CF -- too expensive, user clustering --- 
low quality, search-based -- low quality and then present their novel 
item-to-item CF as something that doesn't have performance/scale/quality 
problems.  I think they said it works as my example above, but it doesn't make 
sense to me - by doing the above all I'm doing is looking at similarity between 
items that Joe already consumed, and I'm really after *new* things to recommend.

> > Is this interpretation correct?  If it is, I don't quite yet see the 
> > benefit 
> of computing similarity between items for a single person.  Of course they 
> will 
> be similar, unless we want to create groups of items for a single person with 
> a 
> diverse set of interests for some reason.
> 
> Not quite, what you are doing is computing an estimated preference for
> everything the user hasn't already computed a preference for. And then
> of course you recommend those with the highest estimated preference.

Right, that's page 23 in PCI.  I thought Amazonians were talking about 
something different.  It'ss tarting to sound like they are talking about the 
same thing.

In my case I'm interested in working with binary preferences, though, so if I 
understand things correctly weighted average doesn't really make sense in that 
context.  Is this correct?

> You do this by computing a weighted average. For every other item in
> the universe (2, 3, 4, ...) compute its similarity to all the user's
> preferred items (1, 10, 20), and sum those similarities times the
> user's preferences for those items.

This sounds exactly like what's described on pages 22-25 in Programming 
Collective Intelligence.  My understanding was that what's in PCI is "an old 
known approach" whereas what is in that article about Amazon is a "new, 
different, and patented approach".  Are they both actually talking about the 
same algo? i.e. Amazon article is the same as what's on p22-25 in PCI?

> User-based recommenders work the same way but kinda turned on its side
> -- you compute similiar *users* then...
> 
> 
> > Wouldn't one need to build item-item similarity matrix for all Joe's items 
> > (1, 
> 10, 20) and all new items (101-200)?
> 
> Yes
> 
> > If so, what's the point of figuring item-item similarity for all items Joe 
> already consumed?
> 
> There isn't. :)

OK, so the summary is that you still have to compute item1-item2 similarity 
matrix where item1 comes from the set of items consumed by Joe and item2 comes 
from the set of items I want to consider for recommendation (i.e. it can be 
only a subset of all items, e.g. only new items since the last time Joe was 
recommended some items 3 days ago).

If my assumption that computing weighted average doesn't make sense in the 
context of binary preferences, then the above is really it - it all boils down 
to computing item1-item2 matrix based on item-item similarity.
While doing that we try to make that item1-item2 as small as possible and try 
to compute it ahead of time, so that at the "give Joe N recommendations" time 
we can simply take topN from that1-item2 matrix.
Is this correct or am I missing some pieces of the puzzle?

> Well...
> 
> Let me digress a bit to perhaps explain why item-based recommenders
> have some interesting power.
> 
> Often, your items are such that you have some a priori, external idea
> of how similar they are -- beyond just a notion based on user
> preferences. For example you might say that two mystery books are
> similar and a mystery and a sci-fi book are not. This could form the
> basis of an item-item similarity metric. This is good because:
> 
> 1) it's more, external info being injected into the system, which
> could improve recommendations, and
> 2) often it is quite cheap to compute this notion, to define the
> complete item-item similarity matrix cheaply. It's not cheap to
> compute the whole thing based on, say, Pearson correlations between
> items based on preferences!
> 
> And interestingly this reasoning doesn't apply so much to user-based
> recommenders. You usually don't have an a priori notion of user-user
> similarity, and, these notions are likely to change as you learn more
> about *users* whereas we expect *item* similarity to be relatively
> stable the more we learn.

Si si :)

> That's why item-based recommenders are not just the reverse of
> user-based recommenders, but a bit interesting in their own right.
> It's not 100% symmetric.

Otis

Re: Recommending when working with binary data sets

Reply via email to