>I agree that getting a parallel SVD running is in and of itself >probably a good project in terms of size. On the other hand it would >be better to end up with a basic recommender as a final product. But >even if SVD by itself doesn't make up a complete unit by itself for >collaborative filtering purposes, it does seem interesting enough as a >unit within the broader mandate of Mahout as a machine learning >project. So I personally could support this as a project indeed.
>I suppose I'd say the first step is to see if anyone's done SVD on >Hadoop yet, and if so, finish the recommender. If not, SVD is useful >by itself. I would like to take this thread forward, I did not see any discussion on this in the archive (at least under this subject or by the authors) and hence the questions. I am interested in implementing a MapReduce based Collaborative Filtering approach. My background: I am Atul Kulkarni Master's student at University of Minnesota Duluth, working on the Netflix Prize problem for my graduate project. I have seen and am implementing for my Machine Learning class project some of the proposed methods for solving Netflix preize problem using Collaborative Filtering methods described by [1], and [3]. For my gradute desertation project of netflix problem, I am working under Dr. Richard Maclin. I am implementing a variant of this method that uses predictions based on data from external data sources about movies and forming representative users and applying nearest neighbor algorithms to predict the rating for the active user. I have done distributed programing using C/MPI for the architecture class and implemented the "Top ranked Phreases in Corpus" project which needed us to find top R phrases of word length N form the GigaWord Corpus. The project details and the project report can be found at http://trpc.sourceforge.net/trpc/Home.html I do not have extensive experience of Hadoop but I have seen the MapReduce frame work videos by Google, and am reading the paper [4] to gain knowledge about MapReduce. My Questions: Is there anyone doing the SVD part or are their any SVD algorithm implementation on Hadoop? If there are then I would like to implement the methods described in [1],[2],[3] for matrix factorization, in specific. I found one paper parallel implementation for Netflix prize problem by [5], but I am not sure if one already has some implementations in place. If there is no implementation of SVD on Hadoop I would like to implement one if no one else is doing it and implement methods described by [1] as one of the recommender for Netflix problem. I have worked with Netflix Prize problem and hence most of my suggested algorithms revolve around that problem. But I am open to other algorithms that might be out there. Is this a good thing to do? References: 1. A. Paterek. Improving regularized singular value decomposition for collaborative ltering. In Proc. of KDD Cup Workshop at SIGKDD'07, 13th ACM Int. Conf. on Knowledge Discovery and Data Mining, pages 39{42, San Jose, CA, USA, 2007. 2. *Gábor Takács, István Pilászy, Bottyán Németh, Domonkos Tikk. *Scalable Collaborative Filtering Approaches for Large Recommender Systems *JMLR*10(Mar):623--656, 2009. 3. http://sifter.org/~simon/journal/20061211.html 4. J. Dean S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters, OSDI'04: Sixth Symposium on Operating System Design and Implementation, San Francisco, CA, December, 2004. 5. Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan, Large-Scale Parallel Collaborative Filtering for the Netflix Prize, LNCS, Algorithmic Aspects in Information and Management, Volume 5034/2008. (I could not find the exact reference to this paper, it can be found here<http://www.springerlink.com/content/j1076u0h14586183/> ). -- Regards, Atul Kulkarni Teaching Assistant, Department of Computer Science, University of Minnesota Duluth Duluth. 55805.
