On Thu, Mar 5, 2009 at 7:27 AM, QIU, Yin <[email protected]> wrote: > I don't know slope one recommender yet. Maybe I should read that first > to know how you manage to divide the tasks. However, a little > explanation in advance would be appreciated.
http://en.wikipedia.org/wiki/Slope_One explains slope one pretty well. > By "refinement and enhancement", could you be more specific? Really, I have never run this code in a real Hadoop environment. There could be bugs, or improvements, that fall out from that. For example there might be some more efficient way to use Hadoop that I don't see. I don't have anything specific in mind -- these are unknown-unknowns to me. But I think this could form part of a decent project. > First, as Hofmann introduced pLSA into CF [3] and I heard SVD on > MapReduce had been tackled (is that true?), is it possible to port his > algorithm to Mahout? This would be a fantastic project, implementing a Recommender based on this approach . I tried implementing an SVD technique a couple years ago and it was waaay too slow on one machine. Revisiting with Hadoop sounds great. > Another one. I know that Canny proposed an algorithm [1] that runs on > different nodes, theoretically without a central database, though for > the sake of privacy. Wang et al. also suggested CF for P2P systems > [2]. But I don't know if they are helpful for defining Hadoop jobs. It's interesting, and I personally find this a worthy project too. On my list of priorities, I don't find a Recommender that prioritizes privacy or minimizing information sharing as compelling. In most real-world cases where exposing preference data might be a concern, I think it can be solved by just using opaque user/item IDs or something. But, I wouldn't object if someone thought they could implement this usefully.
