On Tue, Nov 29, 2011 at 9:56 AM, Nathan Halko <[email protected]> wrote: > The docs look great Dmitriy. Has anyone considered giving oversampling > parameter p a default value? Say p = 25. Slightly high but I imagine most > use cases are noisy and could benefit from the larger value.
Yes that's a good idea that did not occur to me. This guy might get a default value. But wouldn't a good default also depend on k? Say if you ask for k=100 than perhaps p=15 is enough but if you ask for k=500 then 25 sounds about right. Perhaps we could coin an heuristics here for a default as a default p = some f(k). I have been > testing ssvd and lanczos svd on Amazon EMR. Seeing about a 15x speedup in > ssvd over lanczos which is promising. Trying to scale out horizontally but > not seeing any difference between using one slave or many slaves. Any > ideas? (I won't go into detail about the setup here but if sounds familiar > I'd like to talk more). The basic problem with lanczos in the distributed > environment seems to be that a matrix-vector multiply is not enough work to > offset any setup costs, also there is not a distributed orthogonalization > with lanczos and I'm getting OOM's making it difficult to scale. I would > still like to contribute what results I have found but I'm short on time so > nothing besides work directly related to the completion of my thesis will > happen until that is done. > > On Fri, Nov 25, 2011 at 5:37 PM, Dmitriy Lyubimov <[email protected]> wrote: > >> I attached the latex source as well (lyx, actually). I would've used >> Wiki if it supported mathjax. So anyone can modify the usage if need >> be. (Anyone who has lyx anyway). >> >> Dev docs were attached to several jira issues (and i had blog >> entries), if you want to move more recent copies of them moved over >> to wiki, i'd be happy to. Mainly, so far there are 2 working notes, >> one for original method, and another for power iterations, attached to >> corresponding jiras. >> >> >> On Fri, Nov 25, 2011 at 4:26 PM, Grant Ingersoll <[email protected]> >> wrote: >> > I hooked it into the Algorithms page. >> > >> > How do you intend to keep the PDF up to date? I like the focus more on >> the user, but it would also be good to have some dev docs. >> > >> > Also, with both Lanczos and this it would be good if we could hook them >> into some real examples. >> > >> > On Nov 25, 2011, at 5:42 PM, Dmitriy Lyubimov wrote: >> > >> >> Hi, >> >> >> >> I put a usage and overview doc for SSVD onto wiki. I'd appreciate if >> >> somebody else could look thru it, to scan for completeness and >> >> suggestions. >> >> >> >> I tried to approach it as a user-facing documentation, i.e. I tried to >> >> avoid discussing any implementation specifics . >> >> >> >> I had several users and Nathan Halko trying it out and actually >> >> favorably commenting on its scalability vs. Lanczos but i don't know >> >> first hand of any production use (even our own use is fairly limited >> >> (in terms of input volume we ever processed) and actually somewhat >> >> diverged from this Mahout implementation. Perhaps putting it more in >> >> front of users will help to receive more feedback. >> >> >> >> Thanks. >> >> -Dmitriy >> > >> > -------------------------------------------- >> > Grant Ingersoll >> > http://www.lucidimagination.com >> > >> > >> > >> > >>
