Markus, a few notes: - if you do PCA on uncentered data, by computing the eigenvalues of the uncentered covariance matrix, this implies that bands with a larger mean will get more influence on the final PCAs. I have sofar not managed finding an argument why this would be desirable. - if you do PCA on (band-mean)/sd(band), it means that you first normalize (scale) each variable to mean zero and unit variance. This procedure is identical to doing PCA on the correlation matrix. It means that, unlike for unscaled variables, variables with larger variance will not get more influence on the PCA than others. For image analysis I can see a place for both; if bands with low variance indicate insignificant and perhaps noisy information, you may downweight them. Or not, if they contain (equally) important information. Scaling becomes urgent when you compute PCAs from a mix of things with uncomparable units, such as image bands and DTMs. - Only in case of normalized variables, or equivalently PCA on correlations, it makes sense to select PC's with an eigenvalue larger than 1. The reasoning is fairly weak, but goes like this: if a PC has eigenvalue > 1, it explains more variance than any of the original variables, which all have variance 1.
Maybe I should Cc: this to the wiki. -- Edzer Markus Metz wrote: > > Edzer Pebesma wrote: >> Markus Metz wrote: >> >>>> I'm more familiar with non-spatial PCA, so it's high time I read the >>>> manual of i.pca, and the new wiki page on it... >>>> >> I think there's no such thing as spatial or non-spatial PCA. There's >> just PCA. >> > That was a feeble attempt to buy time to go through some statistics > literature ;-) > > So it seems that this thread is about the different values for > eigenvalues. AFAIKT, the answer is in the very first post of this > thread [1]. It seems that i.pca output is supposed to be identical to > prcomp(center=FALSE, scale=FALSE) output in R, because a PCA is > scale-sensitive and the eigenvalue as reported by i.pca is the > variance of the raw, unstandardised data. If outputs are not > identical, either R or grass do some hidden modification or there is a > bug in either grass or R (all within limits, e.g. identical up to the > 5th digit in scientific format is fine?). > > Some textbooks give a rule of thumb for further analysis to use only > components with an eigenvalue >=1 which obviously only works if the > eigenvalue is calculated from standardised values (center=TRUE, > scale=TRUE or e.g. r.mapcalc standardised_map = (map - mean) / > stddev). E.g., comparing the results of MODIS raw and MODIS scaled > with 0.0001 should give <eigenvalue #x of MODIS scaled> = 1E-8 * > <eigenvalue #x of MODIS raw>. > > BTW, the rescaling method of i.pca is not very convincing, as pointed > out by Augustin Lobo. IMHO, fool-proof would be normalization (x - > mean) / stddev. > > [1] http://lists.osgeo.org/pipermail/grass-user/2009-March/049306.html -- Edzer Pebesma Institute for Geoinformatics (ifgi), University of Münster Weseler Straße 253, 48151 Münster, Germany. Phone: +49 251 8333081, Fax: +49 251 8339763 http://ifgi.uni-muenster.de/ http://www.springer.com/978-0-387-78170-9 [email protected] _______________________________________________ grass-stats mailing list [email protected] http://lists.osgeo.org/mailman/listinfo/grass-stats
