Re: [gsoc] Collaborative filtering algorithms

Sean Owen Thu, 05 Mar 2009 01:24:42 -0800

On Thu, Mar 5, 2009 at 7:27 AM, QIU, Yin <[email protected]> wrote:
> I don't know slope one recommender yet. Maybe I should read that first
> to know how you manage to divide the tasks. However, a little
> explanation in advance would be appreciated.


http://en.wikipedia.org/wiki/Slope_One explains slope one pretty well.

> By "refinement and enhancement", could you be more specific?

Really, I have never run this code in a real Hadoop environment. There
could be bugs, or improvements, that fall out from that. For example
there might be some more efficient way to use Hadoop that I don't see.
I don't have anything specific in mind -- these are unknown-unknowns
to me. But I think this could form part of a decent project.

> First, as Hofmann introduced pLSA into CF [3] and I heard SVD on
> MapReduce had been tackled (is that true?), is it possible to port his
> algorithm to Mahout?

This would be a fantastic project, implementing a Recommender based on
this approach . I tried implementing an SVD technique a couple years
ago and it was waaay too slow on one machine. Revisiting with Hadoop
sounds great.

> Another one. I know that Canny proposed an algorithm [1] that runs on
> different nodes, theoretically without a central database, though for
> the sake of privacy. Wang et al. also suggested CF for P2P systems
> [2]. But I don't know if they are helpful for defining Hadoop jobs.

It's interesting, and I personally find this a worthy project too. On
my list of priorities, I don't find a Recommender that prioritizes
privacy or minimizing information sharing as compelling. In most
real-world cases where exposing preference data might be a concern, I
think it can be solved by just using opaque user/item IDs or
something. But, I wouldn't object if someone thought they could
implement this usefully.

Re: [gsoc] Collaborative filtering algorithms

Reply via email to