Nice, well that is pretty much the definition of "item-based collaborative filtering"! It would be interesting to see how it scales indeed. This doesn't include a notion of item ratings (well, maybe the "documents" can include the item tokens several times to indicate a stronger association) but that is not a necessary condition for good recommendations. I think the equivalent in CF is a combination of 1) an item-based recommender and 2) the log-likelihood similarity metric.
On Mon, Jul 13, 2009 at 4:11 PM, Ted Dunning<[email protected]> wrote: > Also, Lucene automagically does weighting which is very, very similar to > exactly what you want. > > To Sean's question, the trick is that Lucene can store a list of item-item > links that were filtered by cooccurrence statistics to form a binary matrix > of interesting links. Then if you query with a user's recent history of > items as a query, you get back a list of items formed by considering > different items to be weighted according to rarity. > > The result is quite good, very fast. The reasons are that Lucene *is* > weighted matrix multiplication of just the right sort. This is what I was > going to talk about in detail at ApacheCon.
