On Thu, Mar 5, 2009 at 1:08 PM, QIU, Yin <[email protected]> wrote: > Glad that you are so positive about this. I just googled and found the > article addressing parallel SVD [1], which was devised by Google. I > shall spend some time reading this. If we are really going to do this > project, implementing only the SVD part would be, in my opinion, good > enough. We can leave implementation of those algorithm relying on SVD > as later work.
I agree that getting a parallel SVD running is in and of itself probably a good project in terms of size. On the other hand it would be better to end up with a basic recommender as a final product. But even if SVD by itself doesn't make up a complete unit by itself for collaborative filtering purposes, it does seem interesting enough as a unit within the broader mandate of Mahout as a machine learning project. So I personally could support this as a project indeed. I suppose I'd say the first step is to see if anyone's done SVD on Hadoop yet, and if so, finish the recommender. If not, SVD is useful by itself. > Privacy was not my concern. I was talking about whether we can get > some inspiration from the idea that the CF process can be distributed > across multiple nodes, though unfortunately, I haven't got a clue :( I think you have a good point. Skimming the paper again I get the sense it itself is a sort of distributed SVD approach, so perhaps it is the same idea as above. All in all this sounds like a great area for a project, in my opinion.
