Re: [jira] Updated: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Dmitriy Lyubimov Wed, 13 Oct 2010 10:37:13 -0700

I see. Very interesting.

the only problem (well something i perceive as a problem) is not even that B
got inflated so much but rather that reduced SVD problem is not (k+p)x(k+p)
problem anymore. There are two things here:


-- user doesn't really set actual precision anymore (k+p was supposed to be
the lever);

-- the reduced svd problem dimensions now ~m. Initially i thought the
philosophy behind that was that we want to be solving a streaming problem of
m x n size and reduce it to a problem that doesn't depend on m or n and
memory-wise n is only constrained by our memory settings on the mapper. in
realiy under this circumstances, m can easily be 1E6 (8 mb dense vector) or
more (default hadoop mapper setting -Xmx200m). m is not bound at all by
memory constraints (i.e. streaming goes along m). So in example to try that
i thought of, m x n can be 1E9x1E6, e.g. petabyte scale problem (sort of SVD
version of a Terasort benchmark). But i guess if BBt dimensions are now
~s(k+p), where s~m, then it is not true anymore and m is theoretically
bounded as well (whether it is a practical issue or not is not my point.
most likely it is not. )

This kind of shifts weight of computation from MR side to what i think is a
single threaded eigensolver. I would like to spend just a tad little more
time to poke around to see if there's still a way to make MR to work a
little bit harder.

-d

On Tue, Oct 12, 2010 at 11:10 PM, Ted Dunning <[email protected]> wrote:

> I don't think it would be a big problem to have this dependency, but I
> would
> prefer to simply port
> the eigenvalue/svd decomposition from math to use our vectors directly.  We
> need such a port
> and they have tests for it already.  I am pretty sure that CM's svd is
> higher quality than Colt's in
> any case.
>
> If there is a way to use our vectors and commons math's code, that would be
> lovely.  I kind of
> doubt it, however.
>
> On Tue, Oct 12, 2010 at 7:37 PM, Dmitriy Lyubimov <[email protected]>
> wrote:
>
> > -- i also ended up using eigen from apache commons math 2.1 . But math
> > module has dependency on it but core module (which is also math heavy )
> > doesn't have such a dependency. Is it a big deal if we add one to the
> core
> > module too?
> >
>

Re: [jira] Updated: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Reply via email to