(Since there were some small errors in the previous text
 I just send a complete correction. Pls excuse for inconvenience)


"Arthur J. Kendall" schrieb:
> 
> It is unusual.
> SPSS has had PCA under FACTOR since at least 1972.
> include the specification
>   /extraction = PC
> or equivalently
>  /extraction = PA1
> on the FACTOR procedure.
> 
> I don't know if it can handle 11,000 variables.
> 

It would need a *lot* of time and memory (at least ~121*16 MB, only for the
correlation matrix.

I remember to have read articles about "large matrices" or "large spare matrices"
some years ago... Search via google, also in sci.stat.consult. They talked about
these matrix-dimensions.

Concerning 150 cases with 11000 variables : you just get at most 150 factors, and
have linear dependencies after that.

If nothing helps, you could do a PCA on the first 150 variables and save the
scores. 
Then you can correlate all variables with the factor scores and put the
correlations together to form a proper factor-loadingsmatrix. This way you can use
as much variables as your program can handle per run. Putting them all together this
gives you a factor-loadings-matrix of 11000*150 (~26 MB per matrix), which might be 
*a little* better to handle than 11000*11000. With SPSS you can read a ready factor-
loadingmatrix directly into the procedure 


Only you don't get factor scores then. If you need them, you can use the matrix-
language facility to build the pseudoinverse of your final factor-loadings matrix,
(this has one dimension of only 150) and matrix-multiply this with your raw-data.

 Say
    V      - Array of all 11000 Variables,variables verical, cases horizontal
    --------------------
    V1     - Array of first 150 variables
    V2     - Array of next  150 variables
    ...
 then
    R  = corr(V1,V1') 
    L0 = cholesky(R) // compute loadingsmatrix, for instance with chlesky method
    I0 = inv(L0)     // inverse of loadingsmatrix for scores-calculation
    Fsc1 = I0*V1     // compute raw scores for your 150 factors

 now compute loadings for all variables. Their loadings are the
 correlations between factors and variables:

    Lad1  = corr(Fsc0,V1)   // loadings for the first 150 variables
    Lad2  = corr(Fsc0,V2)   // loadings for the next 150 variables
   ...
    Ladx  = corr(fsc0,Vx)   // loadings for the last 150 variables

 put them all together to have a combined loadingsmatrix for rotations
   Lad = {L1,L2,L3...}

 After that you can perform the rotations.

 ----------- 

 To get scores, you use matrix-algebra: 

 Since
 [1]  Lad * Fsc = V 
 [2]  Lad'*Lad * Fsc = Lad' * V   // here Lad'*Lad is of 150*150
 [3]    ILad = inv(Lad'*Lad)
 [4] ILT  = ILad*Lad'
 [5] Fsc = ILT * V

  you can get factor-scores just by multiplying your variable-values
  by the matrix ILT, which has one dimension of only 150 at most.

HTH

Gottfried Helms
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Reply via email to