Hi! > Yes there is a framework in the code for running a Recommender across > machines in Hadoop, and a Hadoop job which distributes part of the > processing for a slope one recommender.
I don't know slope one recommender yet. Maybe I should read that first to know how you manage to divide the tasks. However, a little explanation in advance would be appreciated. > Both could use testing, refinement and enhancement. By "refinement and enhancement", could you be more specific? > I do not know of an algorithm which is by nature efficiently distributable. > Finding and implementing such a thing would be great. Actually I don't know either. But I have two naive clues. First, as Hofmann introduced pLSA into CF [3] and I heard SVD on MapReduce had been tackled (is that true?), is it possible to port his algorithm to Mahout? Another one. I know that Canny proposed an algorithm [1] that runs on different nodes, theoretically without a central database, though for the sake of privacy. Wang et al. also suggested CF for P2P systems [2]. But I don't know if they are helpful for defining Hadoop jobs. > I would be the person to contact about this so feel free to run your > proposals by me. I get it. And I won't let this discussion go off the list. :-) [1] J. Canny. Collaborative Filtering with Privacy. In Proceedings of IEEE Symposium on Security and Privacy, 2002. [2] J. Wang, J. Pouwelse, R. L. Lagendijk and M. J. T. Reinders. Distributed collaborative filtering for peer-to-peer file sharing systems. In SAC '06: Proceedings of the 2006 ACM symposium on Applied computing, p/p. 1026-1030. 2006. [3] T. Hofmann. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, volume 22, p.p. 89-115. 2004. -- Yin Qiu 3
