Re: PCA on a large data matrix of order 150x11000 / corrected repost

Robert Ehrlich Thu, 24 Oct 2002 13:42:33 -0700

We commonly do a pca on matrices of , say, a hundred variables and, say, 10,000
samples.  As an earlier poster said, the information can be obtained
from analysis of the  smaller matrix.  the run time is not too long on a pc (maybe 2-10
minutes), the problem is really tied to voluminous output.  Among our applications,
analysis of remote sensing multispectral data sounds similar to your data structure in
that we may have a pixel matrix that is 5000 x 5000 ("samples") and perhaps 50 spectral
bins (variables).  John Imbrie and Ed Klovan showed the way 30 or so years ago and I am
sure that many others have described use of the Q matrix in this way.


Gottfried Helms wrote:

> (Since there were some small errors in the previous text
>  I just send a complete correction. Pls excuse for inconvenience)
>
> "Arthur J. Kendall" schrieb:
> >
> > It is unusual.
> > SPSS has had PCA under FACTOR since at least 1972.
> > include the specification
> >   /extraction = PC
> > or equivalently
> >  /extraction = PA1
> > on the FACTOR procedure.
> >
> > I don't know if it can handle 11,000 variables.
> >
>
> It would need a *lot* of time and memory (at least ~121*16 MB, only for the
> correlation matrix.
>
> I remember to have read articles about "large matrices" or "large spare matrices"
> some years ago... Search via google, also in sci.stat.consult. They talked about
> these matrix-dimensions.
>
> Concerning 150 cases with 11000 variables : you just get at most 150 factors, and
> have linear dependencies after that.
>
> If nothing helps, you could do a PCA on the first 150 variables and save the
> scores.
> Then you can correlate all variables with the factor scores and put the
> correlations together to form a proper factor-loadingsmatrix. This way you can use
> as much variables as your program can handle per run. Putting them all together this
> gives you a factor-loadings-matrix of 11000*150 (~26 MB per matrix), which might be
> *a little* better to handle than 11000*11000. With SPSS you can read a ready factor-
> loadingmatrix directly into the procedure
>
> Only you don't get factor scores then. If you need them, you can use the matrix-
> language facility to build the pseudoinverse of your final factor-loadings matrix,
> (this has one dimension of only 150) and matrix-multiply this with your raw-data.
>
>  Say
>     V      - Array of all 11000 Variables,variables verical, cases horizontal
>     --------------------
>     V1     - Array of first 150 variables
>     V2     - Array of next  150 variables
>     ...
>  then
>     R  = corr(V1,V1')
>     L0 = cholesky(R) // compute loadingsmatrix, for instance with chlesky method
>     I0 = inv(L0)     // inverse of loadingsmatrix for scores-calculation
>     Fsc1 = I0*V1     // compute raw scores for your 150 factors
>
>  now compute loadings for all variables. Their loadings are the
>  correlations between factors and variables:
>
>     Lad1  = corr(Fsc0,V1)   // loadings for the first 150 variables
>     Lad2  = corr(Fsc0,V2)   // loadings for the next 150 variables
>    ...
>     Ladx  = corr(fsc0,Vx)   // loadings for the last 150 variables
>
>  put them all together to have a combined loadingsmatrix for rotations
>    Lad = {L1,L2,L3...}
>
>  After that you can perform the rotations.
>
>  -----------
>
>  To get scores, you use matrix-algebra:
>
>  Since
>  [1]  Lad * Fsc = V
>  [2]  Lad'*Lad * Fsc = Lad' * V   // here Lad'*Lad is of 150*150
>  [3]    ILad = inv(Lad'*Lad)
>  [4] ILT  = ILad*Lad'
>  [5] Fsc = ILT * V
>
>   you can get factor-scores just by multiplying your variable-values
>   by the matrix ILT, which has one dimension of only 150 at most.
>
> HTH
>
> Gottfried Helms

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: PCA on a large data matrix of order 150x11000 / corrected repost

Reply via email to