Nice, well that is pretty much the definition of "item-based
collaborative filtering"! It would be interesting to see how it scales
indeed. This doesn't include a notion of item ratings (well, maybe the
"documents" can include the item tokens several times to indicate a
stronger association) but that is not a necessary condition for good
recommendations. I think the equivalent in CF is a combination of 1)
an item-based recommender and 2) the log-likelihood similarity metric.


On Mon, Jul 13, 2009 at 4:11 PM, Ted Dunning<[email protected]> wrote:
> Also, Lucene automagically does weighting which is very, very similar to
> exactly what you want.
>
> To Sean's question, the trick is that Lucene can store a list of item-item
> links that were filtered by cooccurrence statistics to form a binary matrix
> of interesting links.  Then if you query with a user's recent history of
> items as a query, you get back a list of items formed by considering
> different items to be weighted according to rarity.
>
> The result is quite good, very fast.  The reasons are that Lucene *is*
> weighted matrix multiplication of just the right sort.  This is what I was
> going to talk about in detail at ApacheCon.

Reply via email to