Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

Jake Mannix Mon, 05 Apr 2010 13:56:31 -0700

Hi Richard,

  A few notes about what would be required to get a nice distributed SVD
recommender in Mahout:  if you look at the current distributed recommenders
(in org.apache.mahout.cf.taste.hadoop package and children), you can see
how it works: using HDFS-backed data, a batch of recommendations is
computed for all users at once, with the output being another HDFS file,
which can then be used in a recommender.


  A distributed SVD recommender would work roughly the same way -
there should be a class with a main() method which fires off a Hadoop
job to transform the data into sparse vector form, doing any weighting
or modification to the matrix (such as normalization), then use the
current DistributedLanczosSolver class to compute the SVD, and
use the resultant singular vectors and values to re-fill in the original
matrix, finding items to recommend (see the current in-memory
SVD-recommender to get a feel for how that part is done).

  I hope this helps you understand a little of the scope of this part
of the project.

  -jake


On Mon, Apr 5, 2010 at 1:38 PM, Richard Simon Just <
i...@richardsimonjust.co.uk> wrote:

> Thanks for the super speedy response!
>
> Going on from what you said I've been reading up on the different SVD
> based variants used throughout the Netflix competition and working on my
> proposal. I'm focussing on what you suggested with aiming purely on the
> SVD-based recommender with the possibility of also optimizing the SVD code.
>
> I was wondering when it comes to the proposal what sort of background
> detail should I be going into? Should I be talking about the use of SVD
> within a recommender situation for example? Or given that the Mentors
> already know this should I be discussing purely what sort of SVD-based
> recommender implementation I'm planning? I guess a question  beside the
> question is am I aiming the proposal to people who are familiar with
> Mahout and Machine Learning or to other people as well?
>
> Many thanks
> Richard
>
> Sean Owen wrote:
> > It'd be a matter of making a brand-new distributed recommender. It
> > need not have anything to do with SVDRecommender, which is a fine but
> > separate non-parallel implementation.
> >
> > Tacking on distributed slope-one is fairly easy, I think. Both
> > together, with testing, documentation, etc. are certainly big enough
> > for a GSoC project, probably a bit too large.
> >
> > I'd be pleased to see someone do a quite thorough job with an
> > SVD-based recommender, and perhaps along the way analyzing and
> > optimizing the SVD impl itself, and documenting and testing well and
> > so on. That's a nice project IMHO.
> >
> > On Thu, Apr 1, 2010 at 6:41 PM, Richard Simon Just
> > <i...@richardsimonjust.co.uk> wrote:
> >
> >> Just looking for some clarification. As a GSoC project would the SVD
> >> option mentioned below be a case of integrating the distributed SVD of
> >> MAHOUT-180 with the existing SVDRecommender?
> >>
> >> If so is there still a full GSoC project there? or  would I need to
> >> combine it with say making the slope-one recommender fully distributed
> too?
> >>
> >> Many Thanks
> >> Richard
> >>
> >>
> >
> >
>

Re: Reg. Netflix Prize Apache Mahout GSoC Application (SVD option)

Reply via email to