On Tue, Nov 29, 2011 at 9:56 AM, Nathan Halko <[email protected]> wrote:
> The docs look great Dmitriy.  Has anyone considered giving oversampling
> parameter p a default value? Say p = 25.  Slightly high but I imagine most
> use cases are noisy and could benefit from the larger value.

Yes that's a good idea that did not occur to me. This guy might get a
default value.

But wouldn't a good default also depend on k? Say if you ask for k=100
than perhaps p=15 is enough but if you ask for k=500 then 25 sounds
about right. Perhaps we could coin an heuristics here for a default as
a default p = some f(k).

I have been
> testing ssvd and lanczos svd on Amazon EMR.  Seeing about a 15x speedup in
> ssvd over lanczos which is promising.  Trying to scale out horizontally but
> not seeing any difference between using one slave or many slaves.  Any
> ideas? (I won't go into detail about the setup here but if sounds familiar
> I'd like to talk more).  The basic problem with lanczos in the distributed
> environment seems to be that a matrix-vector multiply is not enough work to
> offset any setup costs, also there is not a distributed orthogonalization
> with lanczos and I'm getting OOM's making it difficult to scale.  I would
> still like to contribute what results I have found but I'm short on time so
> nothing besides work directly related to the completion of my thesis will
> happen until that is done.
>
> On Fri, Nov 25, 2011 at 5:37 PM, Dmitriy Lyubimov <[email protected]> wrote:
>
>> I attached the latex source as well (lyx, actually). I would've used
>> Wiki if it supported mathjax. So anyone can modify the usage if need
>> be. (Anyone who has lyx anyway).
>>
>> Dev docs were attached to several jira issues (and i had blog
>> entries), if you want to move more recent copies of them moved  over
>> to wiki, i'd be happy to. Mainly, so far there are 2 working notes,
>> one for original method, and another for power iterations, attached to
>> corresponding jiras.
>>
>>
>> On Fri, Nov 25, 2011 at 4:26 PM, Grant Ingersoll <[email protected]>
>> wrote:
>> > I hooked it into the Algorithms page.
>> >
>> > How do you intend to keep the PDF up to date?  I like the focus more on
>> the user, but it would also be good to have some dev docs.
>> >
>> > Also, with both Lanczos and this it would be good if we could hook them
>> into some real examples.
>> >
>> > On Nov 25, 2011, at 5:42 PM, Dmitriy Lyubimov wrote:
>> >
>> >> Hi,
>> >>
>> >> I put a usage and overview doc for SSVD onto wiki. I'd appreciate if
>> >> somebody else could look thru it, to scan for completeness and
>> >> suggestions.
>> >>
>> >> I tried to approach it as a user-facing documentation, i.e. I tried to
>> >> avoid discussing any implementation specifics .
>> >>
>> >> I had several users and Nathan Halko trying it out and actually
>> >> favorably commenting on its scalability vs. Lanczos but i don't know
>> >> first hand of any production use (even our own use is fairly limited
>> >> (in terms of input volume we ever processed) and actually somewhat
>> >> diverged from this Mahout implementation. Perhaps putting it more in
>> >> front of users will help to receive more feedback.
>> >>
>> >> Thanks.
>> >> -Dmitriy
>> >
>> > --------------------------------------------
>> > Grant Ingersoll
>> > http://www.lucidimagination.com
>> >
>> >
>> >
>> >
>>

Reply via email to