SVD or a cousin was a very common feature among the leading netflix entries. SVD is, indeed, very slow if you do a complete decomposition. The point, of course, for large sparse matrices is that you want an approximation so you only compute the first few singular vectors/values. To do this efficiently on large sparse matrices you need to use something akin to Lanczos algorithm or a good solver as Jason suggests. The key trick here is to only compute a few singular vectors and to avoid fill-in of the original matrix.
The key step in either the solver or Lanczos algorithms is multiplication by the observation matrix. This is easily parallelized and by far dominates the computation. One really interesting parallel is that the Gibb's sampler for MDCA or LDA has a very similar structure to power methods like Lanczos algorithm, just with a different cost function. Since these probabilistic approaches are known to be considerably better than LSA that is an exciting parallel. The improvement from LSA to LDA is similar to the improvement from Pearson's chi^2 tests to G^2 as I described in my paper mentioned above. The deal is that LSA (and Pearson's chi^2) both use an assumption of normal errors that is unjustified for most small count applications. Recommendation systems are definitely in the small count regime. On Thu, Mar 5, 2009 at 5:52 AM, Jason Rennie <[email protected]> wrote: > On Thu, Mar 5, 2009 at 4:24 AM, Sean Owen <[email protected]> wrote: > > > This would be a fantastic project, implementing a Recommender based on > > this approach . I tried implementing an SVD technique a couple years > > ago and it was waaay too slow on one machine. Revisiting with Hadoop > > sounds great. > > > SVD (at least how it is used for CF---matrix approximation) is really just > least squares, which is not hard (quite simple if you have a Non-linear > Conjugate Gradients solver) and easy to regularize to get even better > answers. I'd be happy to discuss further if anyone is interested (did a > large portion of my PhD work in this area). > > Cheers, > > Jason > > -- > Jason Rennie > Research Scientist, ITA Software > http://www.itasoftware.com/ > -- Ted Dunning, CTO DeepDyve 111 West Evelyn Ave. Ste. 202 Sunnyvale, CA 94086 www.deepdyve.com 408-773-0110 ext. 738 858-414-0013 (m) 408-773-0220 (fax)
