Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Markus Metz Thu, 02 Apr 2009 00:50:08 -0700


Edzer Pebesma wrote:

Markus Metz wrote:

I think scale and normalize are two different things.

I believe that in statistics these two words don't have a generally
accepted definition. They're useful as long as you explain what you mean
by them.

At least in the statistics literature I use, these two methods aredifferently defined. Scaling is like r.rescale, and normalizationconverts data to a mean of 0 and a stddev of 1, the data distribution ischanged to a standard normal distribution. But usually I wouldn't worrytoo much about terms as long as it is explained what they mean.

Well, PCA only captures covariance or correlation, meaning linear
relationships, and it may be the case that the most interesting features

are non-linear.

So if a PCA does not capture non-linear relationships, I don't see howit could help to use PC's that explain nearly no variation in thedataset. And you could do e.g. a log transform first, or whatever elseis appropriate to convert the suspected type of non-linear relation to alinear relation and then feed the transformed variables to a PCA.

For instance, NDVI is the ratio of a sum over a
difference (or reversed?), which cannot be expressed as a linear

combination of bands.

Not directly, but being a normalized difference (should be standardisednot normalized) it can be approximated with linear combinations, i.e.there is at least some correlation between the raw bands and anormalized difference calculated from them.

The first PCA(s?) usually express brightness, only
later ones give more interesting features resulting from more complex
interactions of bands (notably differences) -- loadings usually have the
same sign for the first PC, and mixed signs for later PC's. John C.
Davis in "statistics and data analysis for geologists" called this the
"size and shape effect". The most interesting PC's may have a EV smaller
than 1, when they come from correlation matrices. Geochemists don't shy
away from interpreting 7 or more factors.

The question is not the number of factors, but what criteria to use toselect and interpret the resulting PCs. What makes a PC interesting canbe the amount of explained variance, but also the dominant variables init. BTW, some textbooks recommend to use only rotated PCs if a rotationcould be performed. In a mathematical sense, the sign of the loadings isarbitrary because the absolute value as well as the result of a PCA willstay the same after new_var = -old_var. The same sign for the first PCand so on is not generally valid and with regard to imagery probablyonly applies to surface reflectance or radiation measured at the sensor,and I would guess is dependent on the number of bands and the wavelengthcaptured by each.All this is however far from the i.pca eigenvalue problem, going towardscomments on the general use of PCAs for remote sensing and as suchprobably only of interest to the grass-stats ml.


_______________________________________________
grass-stats mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-stats

Re: [GRASS-stats] Re: [GRASS-user] Testing i.pca ~ prcomp(), m.eigensystem ~ princomp()

Reply via email to