SVD or a cousin was a very common feature among the leading netflix
entries.  SVD is, indeed, very slow if you do a complete decomposition.  The
point, of course, for large sparse matrices is that you want an
approximation so you only compute the first few singular vectors/values.  To
do this efficiently on large sparse matrices you need to use something akin
to Lanczos algorithm or a good solver as Jason suggests.  The key trick here
is to only compute a few singular vectors and to avoid fill-in of the
original matrix.

The key step in either the solver or Lanczos algorithms is multiplication by
the observation matrix.  This is easily parallelized and by far dominates
the computation.

One really interesting parallel is that the Gibb's sampler for MDCA or LDA
has a very similar structure to power methods like Lanczos algorithm, just
with a different cost function.  Since these probabilistic approaches are
known to be considerably better than LSA that is an exciting parallel.  The
improvement from LSA to LDA is similar to the improvement from Pearson's
chi^2 tests to G^2 as I described in my paper mentioned above.  The deal is
that LSA (and Pearson's chi^2) both use an assumption of normal errors that
is unjustified for most small count applications.  Recommendation systems
are definitely in the small count regime.

On Thu, Mar 5, 2009 at 5:52 AM, Jason Rennie <[email protected]> wrote:

> On Thu, Mar 5, 2009 at 4:24 AM, Sean Owen <[email protected]> wrote:
>
> > This would be a fantastic project, implementing a Recommender based on
> > this approach . I tried implementing an SVD technique a couple years
> > ago and it was waaay too slow on one machine. Revisiting with Hadoop
> > sounds great.
>
>
> SVD (at least how it is used for CF---matrix approximation) is really just
> least squares, which is not hard (quite simple if you have a Non-linear
> Conjugate Gradients solver) and easy to regularize to get even better
> answers.  I'd be happy to discuss further if anyone is interested (did a
> large portion of my PhD work in this area).
>
> Cheers,
>
> Jason
>
> --
> Jason Rennie
> Research Scientist, ITA Software
> http://www.itasoftware.com/
>



-- 
Ted Dunning, CTO
DeepDyve

111 West Evelyn Ave. Ste. 202
Sunnyvale, CA 94086
www.deepdyve.com
408-773-0110 ext. 738
858-414-0013 (m)
408-773-0220 (fax)

Reply via email to