Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Markus Metz Wed, 01 Apr 2009 09:57:10 -0700


Edzer Pebesma wrote:

Markus, a few notes:


- if you do PCA on uncentered data, by computing the eigenvalues of the
uncentered covariance matrix, this implies that bands with a larger mean
will get more influence on the final PCAs. I have sofar not managed
finding an argument why this would be desirable.

Add it to wiki? E.g. bands entered in a PCA should have the same mean,but normalization is also an option.

- if you do PCA on (band-mean)/sd(band), it means that you first

normalize (scale)

I think scale and normalize are two different things.

each variable to mean zero and unit variance. This
procedure is identical to doing PCA on the correlation matrix. It means
that, unlike for unscaled variables, variables with larger variance will
not get more influence on the PCA than others. For image analysis I can
see a place for both; if bands with low variance indicate insignificant

and perhaps noisy information, you may downweight them.

Variance is dependent on range, I would rather use something likecoefficient of variation (stddev/mean) to get some scale-independentindicator on the amount of information in a given band. A downscaledband (e.g. MODIS scale of 0.0001) has still the same information butlower variance.

- Only in case of normalized variables, or equivalently PCA on
correlations, it makes sense to select PC's with an eigenvalue larger
than 1. The reasoning is fairly weak, but goes like this: if a PC has
eigenvalue > 1, it explains more variance than any of the original
variables, which all have variance 1.

Sounds good to me, why should I use a component that explains less thanany of the original bands? And the whole purpose of a PCA is variablereduction to get a new set of variables, each explaining the wholedataset better than one of the original variables/bands. A PCA producesas many components as input variables, so some selection is usuallynecessary for further processing, could also be % explained variance.OTOH, sometimes only the first component is of interest. There may beexceptions for imagery processing, e.g. haze reduction (would have toread up on imagery processing too to say anything more about wherecomponents with eigenvalue < 1 could be useful).


_______________________________________________
grass-stats mailing list
grass-stats@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/grass-stats

Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply via email to