Greetings,

I'm hoping that you could help me resolve a problem that I have been
working on for quite sometime.   

We are using PCA, Principal Components Analysis, with arbitrary large
datasets. Thus, we are working only with correlation matrices, and
naturally we hope that PCA will significantly reduce the
dimensionality of our dataset.  Currently we employ proportion of
trace explained to determine which pc's to keep.  However, after
considerable reading on the subject, we discovered that this isn't the
best route to take. Thus we have decided to look into using stopping
rules based on how much residual variability that one is willing to
accept.  This is where confusion sets in.  

Now my primary source of information has been J. Edward Jackson's "A
User's Guide to Principal Component Analysis."  In this book he
describes a number of methods to determine if an outlier exists within
the data, using residual analysis.  However, it is unclear to me how
this factors into determining which pc's to keep, since most of the
statistics in regard to residual analysis deal specifically with
scores obtained from a data vector.  I suspect that we can determine
this by continually testing the diagonal of the residual matrix until
it meats our criteria, but I'm not entirely sure.  Is this a 'good'
approach to take in determining which pc's to keep? Keep in mind that
this needs to be determined without user intervention.   I also
understand that appropriate significance tests must be performed
before this operation.   

Also, just out of curiousity, if we added a sample in principal
component space, reduced dimensionality, and applied an inversion on
this data element, would we get a value in original space that closely
matched, given our residual criteria, what the actual data element
should be.  It would seem true given that we were willing to
sacrifice so much variability performing a PCA.  Thus inverting it
would be close to the actual value but off by criteria given.  

Many thanks for your assistance, 

Jason Walter
CSC Graduate Student 
[EMAIL PROTECTED] 





===========================================================================
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===========================================================================

Reply via email to