On Fri, Sep 25, 2009 at 1:28 PM, Ted Dunning <[email protected]> wrote:
> > With these randomized algorithms, you multiply several random vectors by > your matrix A at one time to get A R = y. Since you choose a few extra > vectors in R, you know with high certainty that the projection of these > random vectors y spans the range of A. This means that you can use y to > compute an orthonormal basis of A very accurately. > > So the key trick here is that unlike power methods, you aren't doing > repeated multiplications by A. That avoids the entire round-off explosion > problem. > Ok, so round-off explosion / MAX_DOUBLE overflow avoidance is great, I dig that part. So the way this is going is that instead of computing A r, A(A r), etc..., you just compute A R, which really is the same algorithmic complexity as Lanczos, (at first I was assuming that "one pass" through the matrix meant that you'd be avoiding the factor of "k" in your asymptotic complexity of the rank-k decomposition, which would have been a pretty neat trick!). Moreover, since all of these multiplications are independent, you can do > these multiplications row by row on A. As you do this, you can store A R > and at the same time be recursively doing a random decomposition on AR so > that after one pass of the data, you have a small dense matrix to work on. > Another pass over AR and one on A' later and you have full decomposition. > If you only want one of the sets of eigenvectors, then you only need the > pass over AR which is much smaller than A. > Oh now that is indeed clever! N x N - > N x k -> k x k -> multiply back and you're pretty much done. That shouldn't be too hard to Hadoop-ify. I'll see if I can try it out while porting over the decomposer stuff. -jake
