actually, perhaps somewhat less than (around k+p=800...1000 ) since we'll have to have 2 q buffers in reducers at the same time for Q merging.
On Thu, Nov 18, 2010 at 11:56 AM, Dmitriy Lyubimov <[email protected]>wrote: > Ok . I guess we'll have to see how it plays out in scale. Current version > does computation on Q blocks that have to be k+p wide. With hadoop default > setting which i think is -Xmx200M, and constraint of m>=n for Q block, that > puts upper limit on k+p into the area of ~1.4 K for completely square dense > Q blocks, other expenses notwithstanding, with default child process > settings.i am to guess it is certainly going to be enough for my personal > purposes :-). I will expect somebody to provide correction on that for > Mahout goals. > > On Thu, Nov 18, 2010 at 11:41 AM, Ted Dunning <[email protected]>wrote: > >> There is an ironic tension with these. Using the power iterations is >> generally bad numerically, but having a small >> p is much worse for accuracy. That means that factoring (A' A)^q A will >> get >> much more accurate values for the same >> value of p. Alternately phrased, getting the same accuracy would require >> a >> much larger value of p and thus would >> overcome the cost of the initial power iteration. >> >> How this works out in practice on truly massive scale is totally up in the >> air. The result of the stochastic projection >> can actually be *larger* than the original sparse matrix which would seem >> to >> imply that the power method might >> actually save time sometimes. >> >> On Thu, Nov 18, 2010 at 11:07 AM, Dmitriy Lyubimov <[email protected] >> >wrote: >> >> > Further work on this may include implementation of power iterations >> > (although i doubt there's much to be had of them on such big volumes). >> > >> > >
