Re: [jira] Updated: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Dmitriy Lyubimov Thu, 18 Nov 2010 12:12:09 -0800

Actually it occurs to me that as far as reducers are concerned, we can thin
the things up in reducers even further by splitting Qhat blocks but mappers
gotta hold Q blocks of (k+p) x r in memory in its entirety


On Thu, Nov 18, 2010 at 12:00 PM, Dmitriy Lyubimov <[email protected]>wrote:

> actually, perhaps somewhat less than (around k+p=800...1000 ) since we'll
> have to have 2 q buffers in reducers at the same time for Q merging.
>
>
> On Thu, Nov 18, 2010 at 11:56 AM, Dmitriy Lyubimov <[email protected]>wrote:
>
>> Ok . I guess we'll have to see how it plays out in scale. Current version
>> does computation on Q blocks that have to be k+p wide. With hadoop default
>> setting which i think is -Xmx200M, and constraint of m>=n for Q block, that
>> puts upper limit on k+p into the area of ~1.4 K for completely square dense
>> Q blocks, other expenses notwithstanding, with default child process
>> settings.i am to guess it is certainly going to be enough for my personal
>> purposes :-). I will expect somebody to provide correction on that  for
>> Mahout goals.
>>
>> On Thu, Nov 18, 2010 at 11:41 AM, Ted Dunning <[email protected]>wrote:
>>
>>> There is an ironic tension with these.  Using the power iterations is
>>> generally bad numerically, but having a small
>>> p is much worse for accuracy.  That means that factoring (A' A)^q A will
>>> get
>>> much more accurate values for the same
>>> value of p.  Alternately phrased, getting the same accuracy would require
>>> a
>>> much larger value of p and thus would
>>> overcome the cost of the initial power iteration.
>>>
>>> How this works out in practice on truly massive scale is totally up in
>>> the
>>> air.  The result of the stochastic projection
>>> can actually be *larger* than the original sparse matrix which would seem
>>> to
>>> imply that the power method might
>>> actually save time sometimes.
>>>
>>> On Thu, Nov 18, 2010 at 11:07 AM, Dmitriy Lyubimov <[email protected]
>>> >wrote:
>>>
>>> > Further work on this may include implementation of power iterations
>>> > (although i doubt there's much to be had of them on such big volumes).
>>> >
>>>
>>
>>
>

Re: [jira] Updated: (MAHOUT-376) Implement Map-reduce version of stochastic SVD

Reply via email to