Gottfried Helms wrote:
> (Since there were some small errors in the previous text
> I just send a complete correction. Pls excuse for inconvenience)
>
> "Arthur J. Kendall" schrieb:
>
>>It is unusual.
>>SPSS has had PCA under FACTOR since at least 1972.
>>include the specification
>> /extraction = PC
>>or equivalently
>> /extraction = PA1
>>on the FACTOR procedure.
>>
>>I don't know if it can handle 11,000 variables.
>
> It would need a *lot* of time and memory (at least ~121*16 MB, only for the
> correlation matrix.
>
...<snip>...
>
> Say
> V - Array of all 11000 Variables,variables verical, cases horizontal
> --------------------
> V1 - Array of first 150 variables
> V2 - Array of next 150 variables
> ...
> then
> R = corr(V1,V1')
> L0 = cholesky(R) // compute loadingsmatrix, for instance with chlesky method
> I0 = inv(L0) // inverse of loadingsmatrix for scores-calculation
> Fsc1 = I0*V1 // compute raw scores for your 150 factors
>
> now compute loadings for all variables. Their loadings are the
> correlations between factors and variables:
>
> Lad1 = corr(Fsc0,V1) // loadings for the first 150 variables
> Lad2 = corr(Fsc0,V2) // loadings for the next 150 variables
> ...
> Ladx = corr(fsc0,Vx) // loadings for the last 150 variables
>
> put them all together to have a combined loadingsmatrix for rotations
> Lad = {L1,L2,L3...}
>
> After that you can perform the rotations.
>
> -----------
>
> To get scores, you use matrix-algebra:
>
> Since
> [1] Lad * Fsc = V
> [2] Lad'*Lad * Fsc = Lad' * V // here Lad'*Lad is of 150*150
> [3] ILad = inv(Lad'*Lad)
> [4] ILT = ILad*Lad'
> [5] Fsc = ILT * V
>
> you can get factor-scores just by multiplying your variable-values
> by the matrix ILT, which has one dimension of only 150 at most.
>
> HTH
>
> Gottfried Helms
Gottfried, I haven't had a chance to spend a lot of time thinking about
the method you describe above ... however ... there are algorithms that
will extract the first few eigenvectors from a huge matrix such as the
one described. One such algorithm is the NIPALS algorithm, it requires
much less memory than a traditional PCA algorithm, as it computes the
vectors sequentially. No matrix inversion or cholesky decompositions
required. It is used widely in chemometrics and other places.
I am wondering if you have tried the NIPALS algorithm, and how it
compares (both in terms of accuracy and in terms of computer resources
and time used) to the algorithm you describe. The NIPALS algorithm gives
the exact same results (to within roundoff error) as the traditional PCA
algorithm. Furthermore, the NIPALS algorithm takes about 10 lines of
MATLAB code to write. These are two powerful reasons to consider using
the NIPALS reason. Using small amounts of memory, not having to invert
large matrices, nor having to find Cholesky decomps of large matrices,
are other benefits to the NIPALS algorithm.
Reference: Martens, H. and Martens, M. (2001) "Multivariate Analysis of
Quality", John Wiley and Sons, Ltd. See Appendix A5
Question for the original poster (Dr. V. Ravi): do you really need all
of the PCA vectors, all 150 of them? Or are the first few enough?
--
Paige Miller
[EMAIL PROTECTED]
http://www.kodak.com
"It's nothing until I call it!" -- Bill Klem, NL Umpire
"When you get the choice to sit it out or dance, I hope you dance" --
Lee Ann Womack
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================