Edzer Pebesma wrote:
Markus Metz wrote:
I think scale and normalize are two different things.
I believe that in statistics these two words don't have a generally
accepted definition. They're useful as long as you explain what you mean
by them.
At least in the statistics literature I use, these two methods are differently defined. Scaling is like r.rescale, and normalization converts data to a mean of 0 and a stddev of 1, the data distribution is changed to a standard normal distribution. But usually I wouldn't worry too much about terms as long as it is explained what they mean.
Well, PCA only captures covariance or correlation, meaning linear
relationships, and it may be the case that the most interesting features
are non-linear.
So if a PCA does not capture non-linear relationships, I don't see how it could help to use PC's that explain nearly no variation in the dataset. And you could do e.g. a log transform first, or whatever else is appropriate to convert the suspected type of non-linear relation to a linear relation and then feed the transformed variables to a PCA.
For instance, NDVI is the ratio of a sum over a
difference (or reversed?), which cannot be expressed as a linear
combination of bands.
Not directly, but being a normalized difference (should be standardised not normalized) it can be approximated with linear combinations, i.e. there is at least some correlation between the raw bands and a normalized difference calculated from them.
The first PCA(s?) usually express brightness, only
later ones give more interesting features resulting from more complex
interactions of bands (notably differences) -- loadings usually have the
same sign for the first PC, and mixed signs for later PC's. John C.
Davis in "statistics and data analysis for geologists" called this the
"size and shape effect". The most interesting PC's may have a EV smaller
than 1, when they come from correlation matrices. Geochemists don't shy
away from interpreting 7 or more factors.
The question is not the number of factors, but what criteria to use to select and interpret the resulting PCs. What makes a PC interesting can be the amount of explained variance, but also the dominant variables in it. BTW, some textbooks recommend to use only rotated PCs if a rotation could be performed. In a mathematical sense, the sign of the loadings is arbitrary because the absolute value as well as the result of a PCA will stay the same after new_var = -old_var. The same sign for the first PC and so on is not generally valid and with regard to imagery probably only applies to surface reflectance or radiation measured at the sensor, and I would guess is dependent on the number of bands and the wavelength captured by each. All this is however far from the i.pca eigenvalue problem, going towards comments on the general use of PCAs for remote sensing and as such probably only of interest to the grass-stats ml.

_______________________________________________
grass-stats mailing list
[email protected]
http://lists.osgeo.org/mailman/listinfo/grass-stats

Reply via email to