Re: PCA on a large data matrix of order 150x11000 - its really simple

Jonathan G Campbell Sun, 27 Oct 2002 00:03:07 -0700

Radford Neal wrote:
> 
> In article <[EMAIL PROTECTED]>,
> Dr. V. Ravi <[EMAIL PROTECTED]> wrote:
> 
> > Thank very much all of you who gave invlauable suggestions and links
> > to papers. It is pretty common in almost all bioinformatics problems
> > that the number of variables swamp the number of samples. Hence, I
> > wanted to perform feature selection by selecting the first few PCs
> > before going for further studies. My student has used the 'snap shot
> > method' of L. Sirovich, contained in MATLAB. I wanted to have a
> > theoritical justification of what he had done. However, the other
> > mehods suggested by the scholars in this forum do certainly enhance my
> > understanding of the problem and the possible solutions.
> 
> I think the responses so far may not have made clear that the situation
> is really very simple.  NO SPECIAL ALGORITHMS ARE NEEDED.  You just need
> to transpose the data matrix, do PCA with whatever algorithm you like,
> and then transform the results.
> 
> Suppose X is the data matrix, with the columns being variables and the
> rows being cases ("samples" in your terminology).  Assume the
> variables have been centered by subtracting their mean, so the sum of
> the the values in each column of X is zero.  (Some people might want
> to scale the variables to have variance one too.)  For your situation,
> the number of columns is much greater than the number of rows.
> 
> The principle components are simply the eigenvectors of the matrix X'X
> (ie, X-transpose times X).  Since X'X is big, however, we'd rather not
> deal with it.  So instead we find the eigenvectors of XX'.  Suppose v
> is such an eigenvector, with eigenvalue a, so that (XX')v = av.  Then
> it's easy to see that (X'X)(X'v) = X'(XX')v = X'av = a(X'v), so X'v
> is an eigenvector of X'X, with eigenvalue a.
> 
> So you just need to find the matrix of principle component vectors for
> X', rather than X, then multiply by X' to get the principle components
> you want.  If you want your principle component vectors to have length
> one, you'll need to adjust for that too.
> 
 [...]


And if the OP wants to read more about this, see the literature of
'eigenfaces' for face recognition. Here, the naive solution might be to
estimate and diagonalise a covariance matrix (X'X above) of vectorised
images, which, for 256 x 256 images is 65536 x 65536.

Best regards,

Jon C.

-- 
Jonathan G Campbell BT48 7PG [EMAIL PROTECTED] 028 7126 6125
http://homepage.ntlworld.com/jg.campbell/
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
.                  http://jse.stat.ncsu.edu/                    .
=================================================================

Re: PCA on a large data matrix of order 150x11000 - its really simple

Reply via email to