Radford Neal wrote:
>
> In article <[EMAIL PROTECTED]>,
> Dr. V. Ravi <[EMAIL PROTECTED]> wrote:
>
> > Thank very much all of you who gave invlauable suggestions and links
> > to papers. It is pretty common in almost all bioinformatics problems
> > that the number of variables swamp the number of samples. Hence, I
> > wanted to perform feature selection by selecting the first few PCs
> > before going for further studies. My student has used the 'snap shot
> > method' of L. Sirovich, contained in MATLAB. I wanted to have a
> > theoritical justification of what he had done. However, the other
> > mehods suggested by the scholars in this forum do certainly enhance my
> > understanding of the problem and the possible solutions.
>
> I think the responses so far may not have made clear that the situation
> is really very simple. NO SPECIAL ALGORITHMS ARE NEEDED. You just need
> to transpose the data matrix, do PCA with whatever algorithm you like,
> and then transform the results.
>
> Suppose X is the data matrix, with the columns being variables and the
> rows being cases ("samples" in your terminology). Assume the
> variables have been centered by subtracting their mean, so the sum of
> the the values in each column of X is zero. (Some people might want
> to scale the variables to have variance one too.) For your situation,
> the number of columns is much greater than the number of rows.
>
> The principle components are simply the eigenvectors of the matrix X'X
> (ie, X-transpose times X). Since X'X is big, however, we'd rather not
> deal with it. So instead we find the eigenvectors of XX'. Suppose v
> is such an eigenvector, with eigenvalue a, so that (XX')v = av. Then
> it's easy to see that (X'X)(X'v) = X'(XX')v = X'av = a(X'v), so X'v
> is an eigenvector of X'X, with eigenvalue a.
>
> So you just need to find the matrix of principle component vectors for
> X', rather than X, then multiply by X' to get the principle components
> you want. If you want your principle component vectors to have length
> one, you'll need to adjust for that too.
>
[...]
And if the OP wants to read more about this, see the literature of
'eigenfaces' for face recognition. Here, the naive solution might be to
estimate and diagonalise a covariance matrix (X'X above) of vectorised
images, which, for 256 x 256 images is 65536 x 65536.
Best regards,
Jon C.
--
Jonathan G Campbell BT48 7PG [EMAIL PROTECTED] 028 7126 6125
http://homepage.ntlworld.com/jg.campbell/
.
.
=================================================================
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at:
. http://jse.stat.ncsu.edu/ .
=================================================================