>I agree that getting a parallel SVD running is in and of itself
>probably a good project in terms of size. On the other hand it would
>be better to end up with a basic recommender as a final product. But
>even if SVD by itself doesn't make up a complete unit by itself for
>collaborative filtering purposes, it does seem interesting enough as a
>unit within the broader mandate of Mahout as a machine learning
>project. So I personally could support this as a project indeed.

>I suppose I'd say the first step is to see if anyone's done SVD on
>Hadoop yet, and if so, finish the recommender. If not, SVD is useful
>by itself.

I would like to take this thread forward, I did not see any discussion on
this in the archive (at least under this subject or by the authors) and
hence the questions. I am interested in implementing a MapReduce based
Collaborative Filtering approach.

My background:

I am Atul Kulkarni Master's student at University of Minnesota Duluth,
working on the Netflix Prize problem for my graduate project. I have seen
and am implementing for my Machine Learning class project some of the
proposed methods for solving Netflix preize problem using Collaborative
Filtering methods described by [1], and [3]. For my gradute desertation
project of netflix problem, I am working under Dr. Richard Maclin. I am
implementing a variant of this method that uses predictions based on data
from external data sources about movies and forming representative users and
applying nearest neighbor algorithms to predict the rating for the active
user.

I have done distributed programing using C/MPI for the architecture class
and implemented the "Top ranked Phreases in Corpus" project which needed us
to find top R phrases of word length N form the GigaWord Corpus. The project
details and the project report can be found at
http://trpc.sourceforge.net/trpc/Home.html

I do not have extensive experience of Hadoop but I have seen the MapReduce
frame work videos by Google, and am reading the paper [4] to gain knowledge
about MapReduce.

My Questions:

Is there anyone doing the SVD part or are their any SVD algorithm
implementation on Hadoop? If there are then I would like to implement the
methods described in [1],[2],[3] for matrix factorization, in specific. I
found one paper parallel implementation for Netflix prize problem by [5],
but I am not sure if one already has some implementations in place. If there
is no implementation of SVD on Hadoop I would like to implement one if no
one else is doing it and implement methods described by [1] as one of the
recommender for Netflix problem.

I have worked with Netflix Prize problem and hence most of my suggested
algorithms revolve around that problem. But I am open to other algorithms
that might be out there. Is this a good thing to do?


References:

1. A. Paterek. Improving regularized singular value decomposition for
collaborative ltering.
In Proc. of KDD Cup Workshop at SIGKDD'07, 13th ACM Int. Conf. on Knowledge
Discovery and Data Mining,
pages 39{42, San Jose, CA, USA, 2007.

2. *Gábor Takács, István Pilászy, Bottyán Németh, Domonkos Tikk. *Scalable
Collaborative Filtering Approaches for Large Recommender Systems
*JMLR*10(Mar):623--656, 2009.

3. http://sifter.org/~simon/journal/20061211.html

4. J. Dean S. Ghemawat. MapReduce: Simplified Data Processing on Large
Clusters, OSDI'04:
Sixth Symposium on Operating System Design and Implementation, San
Francisco, CA, December, 2004.

5. Yunhong Zhou, Dennis Wilkinson, Robert Schreiber, and Rong Pan,
Large-Scale Parallel Collaborative Filtering for the Netflix Prize, LNCS,
Algorithmic Aspects in Information and Management, Volume 5034/2008. (I
could not find the exact reference to this paper, it can be found
here<http://www.springerlink.com/content/j1076u0h14586183/>
).

-- 
Regards,
Atul Kulkarni
Teaching Assistant,
Department of Computer Science,
University of Minnesota Duluth
Duluth. 55805.

Reply via email to