Hi Sean,

> Really, I have never run this code in a real Hadoop environment. There
> could be bugs, or improvements, that fall out from that. For example
> there might be some more efficient way to use Hadoop that I don't see.
> I don't have anything specific in mind -- these are unknown-unknowns
> to me. But I think this could form part of a decent project.

Okay. I won't comment on this before I get to know slope one.

> This would be a fantastic project, implementing a Recommender based on
> this approach . I tried implementing an SVD technique a couple years
> ago and it was waaay too slow on one machine. Revisiting with Hadoop
> sounds great.

Glad that you are so positive about this. I just googled and found the
article addressing parallel SVD [1], which was devised by Google. I
shall spend some time reading this. If we are really going to do this
project, implementing only the SVD part would be, in my opinion, good
enough. We can leave implementation of those algorithm relying on SVD
as later work.

> It's interesting, and I personally find this a worthy project too. On
> my list of priorities, I don't find a Recommender that prioritizes
> privacy or minimizing information sharing as compelling. In most
> real-world cases where exposing preference data might be a concern, I
> think it can be solved by just using opaque user/item IDs or
> something. But, I wouldn't object if someone thought they could
> implement this usefully.

Privacy was not my concern. I was talking about whether we can get
some inspiration from the idea that the CF process can be distributed
across multiple nodes, though unfortunately, I haven't got a clue :(


[1] Gengxin Miao, Yangqiu Song, Dong Zhang, and  Hongjie Bai. Parallel
Spectral Clustering Algorithm for Large-Scale Community Data Mining.
http://yqsong.googlepages.com/swsm08_submission_16.pdf. 2008.

-- 
Yin Qiu

Reply via email to