actually, perhaps somewhat less than (around k+p=800...1000 ) since we'll
have to have 2 q buffers in reducers at the same time for Q merging.

On Thu, Nov 18, 2010 at 11:56 AM, Dmitriy Lyubimov <[email protected]>wrote:

> Ok . I guess we'll have to see how it plays out in scale. Current version
> does computation on Q blocks that have to be k+p wide. With hadoop default
> setting which i think is -Xmx200M, and constraint of m>=n for Q block, that
> puts upper limit on k+p into the area of ~1.4 K for completely square dense
> Q blocks, other expenses notwithstanding, with default child process
> settings.i am to guess it is certainly going to be enough for my personal
> purposes :-). I will expect somebody to provide correction on that  for
> Mahout goals.
>
> On Thu, Nov 18, 2010 at 11:41 AM, Ted Dunning <[email protected]>wrote:
>
>> There is an ironic tension with these.  Using the power iterations is
>> generally bad numerically, but having a small
>> p is much worse for accuracy.  That means that factoring (A' A)^q A will
>> get
>> much more accurate values for the same
>> value of p.  Alternately phrased, getting the same accuracy would require
>> a
>> much larger value of p and thus would
>> overcome the cost of the initial power iteration.
>>
>> How this works out in practice on truly massive scale is totally up in the
>> air.  The result of the stochastic projection
>> can actually be *larger* than the original sparse matrix which would seem
>> to
>> imply that the power method might
>> actually save time sometimes.
>>
>> On Thu, Nov 18, 2010 at 11:07 AM, Dmitriy Lyubimov <[email protected]
>> >wrote:
>>
>> > Further work on this may include implementation of power iterations
>> > (although i doubt there's much to be had of them on such big volumes).
>> >
>>
>
>

Reply via email to