Actually it occurs to me that as far as reducers are concerned, we can thin the things up in reducers even further by splitting Qhat blocks but mappers gotta hold Q blocks of (k+p) x r in memory in its entirety
On Thu, Nov 18, 2010 at 12:00 PM, Dmitriy Lyubimov <[email protected]>wrote: > actually, perhaps somewhat less than (around k+p=800...1000 ) since we'll > have to have 2 q buffers in reducers at the same time for Q merging. > > > On Thu, Nov 18, 2010 at 11:56 AM, Dmitriy Lyubimov <[email protected]>wrote: > >> Ok . I guess we'll have to see how it plays out in scale. Current version >> does computation on Q blocks that have to be k+p wide. With hadoop default >> setting which i think is -Xmx200M, and constraint of m>=n for Q block, that >> puts upper limit on k+p into the area of ~1.4 K for completely square dense >> Q blocks, other expenses notwithstanding, with default child process >> settings.i am to guess it is certainly going to be enough for my personal >> purposes :-). I will expect somebody to provide correction on that for >> Mahout goals. >> >> On Thu, Nov 18, 2010 at 11:41 AM, Ted Dunning <[email protected]>wrote: >> >>> There is an ironic tension with these. Using the power iterations is >>> generally bad numerically, but having a small >>> p is much worse for accuracy. That means that factoring (A' A)^q A will >>> get >>> much more accurate values for the same >>> value of p. Alternately phrased, getting the same accuracy would require >>> a >>> much larger value of p and thus would >>> overcome the cost of the initial power iteration. >>> >>> How this works out in practice on truly massive scale is totally up in >>> the >>> air. The result of the stochastic projection >>> can actually be *larger* than the original sparse matrix which would seem >>> to >>> imply that the power method might >>> actually save time sometimes. >>> >>> On Thu, Nov 18, 2010 at 11:07 AM, Dmitriy Lyubimov <[email protected] >>> >wrote: >>> >>> > Further work on this may include implementation of power iterations >>> > (although i doubt there's much to be had of them on such big volumes). >>> > >>> >> >> >
