Dear Philipp (and others),

your comment brings in an interesting issue of dimension reduction. Crucial 
here is the choice of the number of dimensions (PCs) to be retained. 
Sometimes this is quite clear from the scree plot, but it may happen easily 
the scree plot support multiple choices and it would be nice to have a 
numerical method assisting the decision. If variation within the sample was 
isotropic, no eigenvalue would be consistently larger than the 
corresponding eigenvalue in a number of random draws from the isotropic 
distribution. This distribution is a poor null model for shape variation, 
however, and it would support retention of too many PCs. More sophisticated 
bootstrap strategies can be found in literature, but to be applied to GM 
data they would have to respect dependencies imposed by spatial arrangement 
of landmarks. Is there any approach you would recommend for this purpose?


On Tuesday, May 28, 2019 at 8:38:02 PM UTC+2, wrote:
> Dear all,
> I also want to comment on the recent bgPCA postings.
> Andrea et al. and Fred are right that bgPCA produces ordination plots in 
> which two or more groups are discriminated more (i.e., the groups overlap 
> less) than they should, whenever p (number of variables) is large relative 
> to n (sample size). Thanks Andrea for noticing that, or whoever figured it 
> out first; it was not me, admittedly. In the case of samples from the same 
> distribution (i.e., no "real" group differences), the samples can even 
> appear to be distinct if p is larger than n. This phenomenon is much more 
> severe in CVA than in bgPCA (as we showed in the 2011 paper), but we were 
> not aware back then that it is also present in bgPCA. Please note that this 
> does NOT mean that ALL results inferred from bgPCA are wrong, only those 
> about group separation are biased; the relationship between group means in 
> bgPCA is necessarily the same as in an ordinary PCA (but see below).
> I have two main comments and advices.
> 1) The simulations of identical independent noise for an increasing number 
> of variables, as in Fred's current manuscript and in our 2011 paper, are 
> not quite realistic because morphometric variables are highly correlated; 
> the "real" degrees of freedom thus are much less than the number of 
> variables. Put another way, if you set more and more landmarks on an sample 
> of specimens, not every landmark introduces a new degree of freedom because 
> its location may be predictable by the adjacent landmarks. Theoretically, 
> there is a maximal number of degrees of freedoms in a given sample that 
> reflects the actual spatial scale of the shape differences studied. If the 
> given shape differences are captured well by the current landmark set, 
> adding more landmarks will not add any further information and not increase 
> the relevant degrees of freedom. For example, if shape variation comprises 
> only affine shape variation (linear scaling and shearing), the relevant 
> shape space has only two degrees of freedoms (two dimensions), regardless 
> of how many landmarks were measured.
> As a result of this, most morphometric data, even those consisting of many 
> landmarks, can be described well by a small number of principal components, 
> as we all know. Ideally, these few PCs capture the "real" dimensionality of 
> shape space (i.e., they are some rotation of the underlying factor 
> structure), which is much less than the number of landmarks. In practice, 
> the problem is that every landmarks entails some small independent 
> measurement error, and hence the "cut-off" for the number of dimensions is 
> not necessarily obvious. In the above example with only affine shape 
> variation, for more than three landmarks there will still be more than two 
> PCs with non-zero variance, but hopefully the first PCs are a good estimate 
> of these non-affine components. Other methods than ordinary PCA may do a 
> better job for this task, e.g. methods that take into account spatial 
> scale, such as the spatially weighted relative warps in Bookstein's orange 
> book or the relative intrinsic warps in Bookstein (2015). Blame Fred for 
> these names ;-) 
> Many multivariate statistical analyses - including bgPCA, CVA, relative 
> PCA, and also the computation of shape distances or angles between shape 
> trajectories, etc. - should be performed within this subspace (i.e., based 
> on the first few PCs rather than on the original shape coordinates). bgPCA 
> and CVA may be considered kinds of factor rotation within this subspace 
> rather than methods of variable reduction.
> Hence, many of the problems described by Andrea et al. and Fred can be 
> avoided by variable reduction (ordinary PCA) prior to bgPCA and related 
> techniques. This requires a careful inspection of the scree plot and the 
> corresponding PCs. The actual sample size must be large relative to the 
> number of PCs retained (not necessarily relative to the number of 
> landmarks).
> 2) Many applications of PCA or CVA aim to combine multiple analytical 
> steps that are not necessarily commensurate: 
> - Exploratory study of group mean differences
> - Relating multivariate mean differences across multiple groups by an 
> ordination analysis
> - Discrimination analysis (studying if and to what degree groups overlap 
> in their distribution of individual variation)
> - Perhaps even the estimation of a discrimination function, i.e., a 
> combination of variables that maximally discriminates the groups.
> The value or burden of having many landmarks is different for each of 
> these tasks.
> When "exploring" differences in average shape between groups, without 
> strong prior expectations (i.e., without knowing where the signal is), it 
> is clearly useful to measure as many landmarks as possible, as this 
> increases spatial resolution. In contrast to Andrea, I think that 
> "beautiful pictures" can be of value because morphology is a visual 
> discipline, after all. For computing group means or shape regressions, p>n 
> is no problem. The challenge in this step is to judge whether the observed 
> differences are scientifically relevant, which may (but often does not) 
> include the assessment of statistical significance. An excess of variables 
> over cases can challenge statistical significance testing (multivariate 
> parametric tests require full rank data and n>>p; for shape coordinates 
> this ALWAYS requires dimension reduction, even for three landmarks). 
> Only if group means really differ, it makes sense to relate multivariate 
> group mean differences by an ordination analysis. This requires an 
> interpretable metric (a "distance" function such as Procrustes distance), 
> which is itself challenging and can constrain the geometric structures that 
> are interpretable (e.g., Mitteroecker & Huttegger 2009, Huttegger & 
> Mitteroecker 2011). Technically, this step sets no limits to the number of 
> variables, but for normally distributed variables the expected value of a 
> Euclidean distance increases linearly with the square root of the number of 
> variables (chi distribution). This is no problem per se, but for small 
> signals and many variables, the summed noise in the many landmarks can 
> dominate the small signal. Also, this leads to a somewhat paradoxical 
> situation: even if for each variable the estimated sample average is close 
> to the population mean, the Euclidean distance between the multivariate 
> sample average and the multivariate population mean increases with p. This 
> relationship is also the reason why bgPC scores show too much group 
> separation if p is large: the more variables, the larger the distance 
> between group means, even though the within-group variances stay the same 
> (for two groups, the squared Mahalanobis distance for the bgPC is approx. 
> 2p/n). 
> Perhaps more important than the _number_ of variables is the spatial 
> distribution of landmarks on the organism. E.g., structures covered by many 
> landmarks more strongly affect the multivariate distance than structures 
> covered by less landmarks. Semilandmarks may or may not be helpful in this 
> regard for a comprehensive coverage of organisms. 
> Discrimination analysis (DA) aims at assessing the success of 
> classification. Classification and discrimination require the estimation of 
> variance for every linear combination of the variables and thus n>>p (in 
> most multivariate settings, this implies prior variable reduction). 
> Reliable DA also requires a cross-validation approach and cannot be 
> inferred without bias from a standard ordination analysis: PCA tends to 
> underestimate classification success (group separation), whereas bgPCA, and 
> even more so CVA, tends to overestimate it. DA can be considered an 
> exploratory approach, but it makes only sense if group means are known to 
> differ.
> Discriminant function analysis (DFA), and its extension to multiple groups 
> (CVA), estimate linear combinations of the measured variables that 
> _maximize_ group separation and, hence, classification success. This goes 
> beyond the exploratory analysis of group differences and not necessarily 
> needs to be combined with an ordination analysis. Classically, it is used 
> to derive a simple linear combination of the variables for efficient 
> classification. It is well known  for these methods that the within-sample 
> classification success is a highly biased estimate of the out-of-sample 
> classification success; hence the need for cross-validation.
> No single method can do all these steps well. The choice of method and 
> also the choice of landmarks really depend on the biological question and 
> the prior knowledge or hypotheses. If discrimination or classification is 
> the primary aim, cross-validation is indispensable; an ordination analysis 
> is not sufficient, perhaps not even necessary. If the signal (morphological 
> difference) is known prior to the analysis, not many landmarks are 
> necessary. Without any prior expectation, a dense landmark set may be 
> necessary to explore shape variation. But this sets fundamental limits to 
> studies of discrimination and  classification; there is a kind of 
> "uncertainty principle": for a given sample size, you cannot observe 
> arbitrarily high spatial resolution (number of variables) and the exact 
> discrimination of groups (classification success) at the same time. 
> Best,
> Philipp M.

MORPHMET may be accessed via its webpage at
You received this message because you are subscribed to the Google Groups 
"MORPHMET" group.
To unsubscribe from this group and stop receiving emails from it, send an email 

Reply via email to