Re: PCA on a large data matrix of order 150x11000 / corrected repost

Paige Miller Thu, 24 Oct 2002 13:41:28 -0700

Gottfried Helms wrote:
> (Since there were some small errors in the previous text
>  I just send a complete correction. Pls excuse for inconvenience)
> 
> "Arthur J. Kendall" schrieb:
> 
>>It is unusual.
>>SPSS has had PCA under FACTOR since at least 1972.
>>include the specification
>>  /extraction = PC
>>or equivalently
>> /extraction = PA1
>>on the FACTOR procedure.
>>
>>I don't know if it can handle 11,000 variables.
> 
> It would need a *lot* of time and memory (at least ~121*16 MB, only for the
> correlation matrix.
> 
   ...<snip>...
> 
>  Say
>     V      - Array of all 11000 Variables,variables verical, cases horizontal
>     --------------------
>     V1     - Array of first 150 variables
>     V2     - Array of next  150 variables
>     ...
>  then
>     R  = corr(V1,V1') 
>     L0 = cholesky(R) // compute loadingsmatrix, for instance with chlesky method
>     I0 = inv(L0)     // inverse of loadingsmatrix for scores-calculation
>     Fsc1 = I0*V1     // compute raw scores for your 150 factors
> 
>  now compute loadings for all variables. Their loadings are the
>  correlations between factors and variables:
> 
>     Lad1  = corr(Fsc0,V1)   // loadings for the first 150 variables
>     Lad2  = corr(Fsc0,V2)   // loadings for the next 150 variables
>    ...
>     Ladx  = corr(fsc0,Vx)   // loadings for the last 150 variables
> 
>  put them all together to have a combined loadingsmatrix for rotations
>    Lad = {L1,L2,L3...}
> 
>  After that you can perform the rotations.
> 
>  ----------- 
> 
>  To get scores, you use matrix-algebra: 
> 
>  Since
>  [1]  Lad * Fsc = V 
>  [2]  Lad'*Lad * Fsc = Lad' * V   // here Lad'*Lad is of 150*150
>  [3]    ILad = inv(Lad'*Lad)
>  [4] ILT  = ILad*Lad'
>  [5] Fsc = ILT * V
> 
>   you can get factor-scores just by multiplying your variable-values
>   by the matrix ILT, which has one dimension of only 150 at most.
> 
> HTH
> 
> Gottfried Helms


Gottfried, I haven't had a chance to spend a lot of time thinking about 
the method you describe above ... however ... there are algorithms that 
will extract the first few eigenvectors from a huge matrix such as the 
one described. One such algorithm is the NIPALS algorithm, it requires 
much less memory than a traditional PCA algorithm, as it computes the 
vectors sequentially. No matrix inversion or cholesky decompositions 
required. It is used widely in chemometrics and other places.

I am wondering if you have tried the NIPALS algorithm, and how it 
compares (both in terms of accuracy and in terms of computer resources 
and time used) to the algorithm you describe. The NIPALS algorithm gives 
the exact same results (to within roundoff error) as the traditional PCA 
algorithm. Furthermore, the NIPALS algorithm takes about 10 lines of 
MATLAB code to write. These are two powerful reasons to consider using 
the NIPALS reason. Using small amounts of memory, not having to invert 
large matrices, nor having to find Cholesky decomps of large matrices, 
are other benefits to the NIPALS algorithm.

Reference: Martens, H. and Martens, M. (2001) "Multivariate Analysis of 
Quality", John Wiley and Sons, Ltd. See Appendix A5

Question for the original poster (Dr. V. Ravi): do you really need all 
of the PCA vectors, all 150 of them? Or are the first few enough?

--
Paige Miller
[EMAIL PROTECTED]
http://www.kodak.com

"It's nothing until I call it!" -- Bill Klem, NL Umpire
"When you get the choice to sit it out or dance, I hope you dance" -- 
Lee Ann Womack
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: PCA on a large data matrix of order 150x11000 / corrected repost

Reply via email to