you may want to take a look at mahout-376.

https://issues.apache.org/jira/browse/MAHOUT-376


On Sun, Sep 5, 2010 at 10:37 PM, RadimRehurek <[email protected]>wrote:

> Hello Dmitriy,
>
> > i came very close to try out Ted's layout for stochastic svd (the way i
> > understood it, with block QR solvers on mapper side instead of
> gram-schmidt
> > on the whole scale as Tropp seems to suggest ) and not following mahout's
> > general architecture, but I wasn't actually able to  carve out enough
> time
> > for this. But sooner or later somebody will implement that, and then
> stuff
>
> you are right -- I kicked myself to finally implementing that paper after
> seeing Daisuke Okanohara's "redsvd" :-)
>
> * http://code.google.com/p/redsvd/wiki/English , in-core stochastic
> decomposition on top of eigen3, a beautiful C++ template library
>
> In any case, it looks like the power iterations are really needed already
> for mid-sized problems like Wikipedia. Either that, or use very aggressive
> over-sampling (I use #samples=2*requested_rank at the moment, and it's not
> enough). That means doing extra passes over the data to compute
> Y=(A*A^T)^q*A*G instead of just Y=A*G :-( ... unless there is a clever way
> to avoid it.
>
> Radim
>
> > On Sun, Sep 5, 2010 at 7:33 PM, RadimRehurek <[email protected]>
> wrote:
> >
> > > See that module's docstring; reading the input is slower than
> processing it
> > > with the stochastic decomposition.
> > >
> > > In short: in order for distributed computing to make sense
> > > (performance-wise), the data would already need to be pre-distributed,
> too.
> > >
> > > This is true in Hadoop, so I guess stochastic decomposition is an algo
> > > where MAHOUT could really make a difference on terabyte+ problems.
> > >
> > > Radim
> > >
> >
> >
> >
>

Reply via email to