most definitely so. i came very close to try out Ted's layout for stochastic svd (the way i understood it, with block QR solvers on mapper side instead of gram-schmidt on the whole scale as Tropp seems to suggest ) and not following mahout's general architecture, but I wasn't actually able to carve out enough time for this. But sooner or later somebody will implement that, and then stuff like LSI on massive scale will become very much a reality not marred by 'being slow' stigmas etc. Truly an opportunity to shine for someone.
On Sun, Sep 5, 2010 at 7:33 PM, RadimRehurek <[email protected]> wrote: > See that module's docstring; reading the input is slower than processing it > with the stochastic decomposition. > > In short: in order for distributed computing to make sense > (performance-wise), the data would already need to be pre-distributed, too. > > This is true in Hadoop, so I guess stochastic decomposition is an algo > where MAHOUT could really make a difference on terabyte+ problems. > > Radim >
