Ok . I guess we'll have to see how it plays out in scale. Current version does computation on Q blocks that have to be k+p wide. With hadoop default setting which i think is -Xmx200M, and constraint of m>=n for Q block, that puts upper limit on k+p into the area of ~1.4 K for completely square dense Q blocks, other expenses notwithstanding, with default child process settings.i am to guess it is certainly going to be enough for my personal purposes :-). I will expect somebody to provide correction on that for Mahout goals.
On Thu, Nov 18, 2010 at 11:41 AM, Ted Dunning <[email protected]> wrote: > There is an ironic tension with these. Using the power iterations is > generally bad numerically, but having a small > p is much worse for accuracy. That means that factoring (A' A)^q A will > get > much more accurate values for the same > value of p. Alternately phrased, getting the same accuracy would require a > much larger value of p and thus would > overcome the cost of the initial power iteration. > > How this works out in practice on truly massive scale is totally up in the > air. The result of the stochastic projection > can actually be *larger* than the original sparse matrix which would seem > to > imply that the power method might > actually save time sometimes. > > On Thu, Nov 18, 2010 at 11:07 AM, Dmitriy Lyubimov <[email protected] > >wrote: > > > Further work on this may include implementation of power iterations > > (although i doubt there's much to be had of them on such big volumes). > > >
