One of the most serious worries is that B_i is roughly the same size as A which may make the successive products very expensive. In particular, it looks at first glance like we may need column-wise segmentation of A in addition to row-wise segmentation in order to compute A B'_j. That is only a first glance so it is likely not very deep.
On Thu, Aug 25, 2011 at 7:31 PM, Nathan Halko <[email protected]>wrote: > The lower the condition number (or low signal to noise) the harder it is to > extract the top k singular vectors because in a sense they are not that > much > more important than the other n-k. We see pollution from the smaller n-k > singular directions and that degrades our approximation of the top k space. > Power iterations (just a few) are extremely important to amplify the gap > between important directions and the unimportant directions. Instead of > sampling matrix A, we sample matrix (AA*)^qA which has the same singular > vectors but an exaggerated spectrum > > sigma^{2q+1} > > In infinite precision there would be no need to orthogonalize between > iterations, only at the last step. However, in finite precision, the small > singular values can fall below machine precision when taken to the 2q+1st > power and we won't be able to accurately recover them. It also prevents > overflow if your matrix has a very large sig_max. It is mostly a > precaution > to keep from loosing information and for most cases could probably be > skipped or done only intermittently. If orthogonalization is a bottleneck > we could consider not doing it. > > > > > Modified power iterations in existing SSVD code > > > ----------------------------------------------- > > > > > > Key: MAHOUT-796 > > > URL: https://issues.apache.org/jira/browse/MAHOUT-796 > > > Project: Mahout > > > Issue Type: Improvement > > > Components: Math > > > Affects Versions: 0.5 > > > Reporter: Dmitriy Lyubimov > > > Assignee: Dmitriy Lyubimov > > > Labels: SSVD > > > Fix For: 0.6 > > > > > > > > >
