Simple co-occurrence counting is at the heart of most large-scale recommendation systems. Counting plus simple (but sound) statistical filtering suffices for a broad range of recommendation tasks with very high quality results. For statistical filtering, I typically recommend the G^2 statistic as a heuristic score (see my blog about surprise and coincidence<http://tdunning.blogspot.com/2008/03/surprise-and-coincidence.html>for details). These are completely scalable algorithms but Mahout doesn't have an implementation.
On Wed, Mar 4, 2009 at 12:55 AM, Sean Owen <[email protected]> wrote: > I do not know of an algorithm which is by nature efficiently distributable. > Finding and implementing such a thing would be great. > -- Ted Dunning, CTO DeepDyve
