I think that the solver actually does an SVD, but most of what you say follows.
THere is one strangeness, I think in that the DistributedRowMatrix.times is doing a transposeTimes operation, not the normal times. Jake should comment. On Thu, Sep 2, 2010 at 8:28 PM, Jeff Eastman <[email protected]>wrote: > On 9/2/10 7:41 PM, Jeff Eastman wrote: > >> Hopefully answering my own question here but ending up with another. The >> svd matrix I'd built from the eigenvectors is the wrong shape as I built it. >> Taking Jake's "column space" literally and building a matrix where each of >> the columns is one of the eigenvectors does give a matrix of the correct >> shape. The math works with DenseMatrix, producing a new data matrix which is >> 15x7; a significant dimensionality reduction from 15x39. >> >> In this example, with 15 samples having 39 terms and 7 eigenvectors: >> A = [15x39] >> P = [39x7] >> A P = [15x7] >> <snip> >> > Representing the eigen decomposition math in the above notation, A P is the > projection of the data set onto the eigenvector basis: > > If: > A = original data matrix > P = eigenvector column matrix > D = eigenvalue diagonal matrix > > Then: > A P = P D => A = P D P' > > Since we have A and P is already calculated by DistributedLanczosSolver it > is easy to compute A P and we don't need the eigenvalues at all. This is > good because the DLS does not output them. Is this why it doesn't bother? > >
