most definitely so.

i came very close to try out Ted's layout for stochastic svd (the way i
understood it, with block QR solvers on mapper side instead of gram-schmidt
on the whole scale as Tropp seems to suggest ) and not following mahout's
general architecture, but I wasn't actually able to  carve out enough time
for this. But sooner or later somebody will implement that, and then stuff
like LSI on massive scale will become very much a reality not marred by
'being slow' stigmas etc. Truly an opportunity to shine for someone.

On Sun, Sep 5, 2010 at 7:33 PM, RadimRehurek <[email protected]> wrote:

> See that module's docstring; reading the input is slower than processing it
> with the stochastic decomposition.
>
> In short: in order for distributed computing to make sense
> (performance-wise), the data would already need to be pre-distributed, too.
>
> This is true in Hadoop, so I guess stochastic decomposition is an algo
> where MAHOUT could really make a difference on terabyte+ problems.
>
> Radim
>

Reply via email to