The docs look great Dmitriy. Has anyone considered giving oversampling parameter p a default value? Say p = 25. Slightly high but I imagine most use cases are noisy and could benefit from the larger value. I have been testing ssvd and lanczos svd on Amazon EMR. Seeing about a 15x speedup in ssvd over lanczos which is promising. Trying to scale out horizontally but not seeing any difference between using one slave or many slaves. Any ideas? (I won't go into detail about the setup here but if sounds familiar I'd like to talk more). The basic problem with lanczos in the distributed environment seems to be that a matrix-vector multiply is not enough work to offset any setup costs, also there is not a distributed orthogonalization with lanczos and I'm getting OOM's making it difficult to scale. I would still like to contribute what results I have found but I'm short on time so nothing besides work directly related to the completion of my thesis will happen until that is done.
On Fri, Nov 25, 2011 at 5:37 PM, Dmitriy Lyubimov <[email protected]> wrote: > I attached the latex source as well (lyx, actually). I would've used > Wiki if it supported mathjax. So anyone can modify the usage if need > be. (Anyone who has lyx anyway). > > Dev docs were attached to several jira issues (and i had blog > entries), if you want to move more recent copies of them moved over > to wiki, i'd be happy to. Mainly, so far there are 2 working notes, > one for original method, and another for power iterations, attached to > corresponding jiras. > > > On Fri, Nov 25, 2011 at 4:26 PM, Grant Ingersoll <[email protected]> > wrote: > > I hooked it into the Algorithms page. > > > > How do you intend to keep the PDF up to date? I like the focus more on > the user, but it would also be good to have some dev docs. > > > > Also, with both Lanczos and this it would be good if we could hook them > into some real examples. > > > > On Nov 25, 2011, at 5:42 PM, Dmitriy Lyubimov wrote: > > > >> Hi, > >> > >> I put a usage and overview doc for SSVD onto wiki. I'd appreciate if > >> somebody else could look thru it, to scan for completeness and > >> suggestions. > >> > >> I tried to approach it as a user-facing documentation, i.e. I tried to > >> avoid discussing any implementation specifics . > >> > >> I had several users and Nathan Halko trying it out and actually > >> favorably commenting on its scalability vs. Lanczos but i don't know > >> first hand of any production use (even our own use is fairly limited > >> (in terms of input volume we ever processed) and actually somewhat > >> diverged from this Mahout implementation. Perhaps putting it more in > >> front of users will help to receive more feedback. > >> > >> Thanks. > >> -Dmitriy > > > > -------------------------------------------- > > Grant Ingersoll > > http://www.lucidimagination.com > > > > > > > > >
