Hello Dmitriy,

> i came very close to try out Ted's layout for stochastic svd (the way i
> understood it, with block QR solvers on mapper side instead of gram-schmidt
> on the whole scale as Tropp seems to suggest ) and not following mahout's
> general architecture, but I wasn't actually able to  carve out enough time
> for this. But sooner or later somebody will implement that, and then stuff

you are right -- I kicked myself to finally implementing that paper after 
seeing Daisuke Okanohara's "redsvd" :-)

* http://code.google.com/p/redsvd/wiki/English , in-core stochastic 
decomposition on top of eigen3, a beautiful C++ template library

In any case, it looks like the power iterations are really needed already for 
mid-sized problems like Wikipedia. Either that, or use very aggressive 
over-sampling (I use #samples=2*requested_rank at the moment, and it's not 
enough). That means doing extra passes over the data to compute Y=(A*A^T)^q*A*G 
instead of just Y=A*G :-( ... unless there is a clever way to avoid it.

Radim

> On Sun, Sep 5, 2010 at 7:33 PM, RadimRehurek <[email protected]> wrote:
> 
> > See that module's docstring; reading the input is slower than processing it
> > with the stochastic decomposition.
> >
> > In short: in order for distributed computing to make sense
> > (performance-wise), the data would already need to be pre-distributed, too.
> >
> > This is true in Hadoop, so I guess stochastic decomposition is an algo
> > where MAHOUT could really make a difference on terabyte+ problems.
> >
> > Radim
> >
> 
> 
> 

Reply via email to