Re: [MORPHMET2] Measurement error in geometric morphometrics

andrea cardini Mon, 07 Nov 2022 06:08:57 -0800

Hi Mike,
you wrote something I am sure I misunderstood:

"small measurement error can look larger when individuals are not thatdifferent in shape or large measurement error can look small if they are"ME is relative and scaling it to the variation in the sample isprecisely what we need to do: if 'true' variation is large (say, in amacroevolutionary analysis with many different species and genera), acertain amount of ME might be negligible; if 'true' variation is small(with the same configuration, I may be studying small geographicvariation among closely related populations), that same amount of MEcould be too much. That will be the same for any type of measurement(CS, shape, traditional morphometrics).

Your simulation seems to aim at relating the relative estimate of ME Iget from the Procrustes ANOVA (the two R2s) to absolute variation due todigitization error on the original specimens. Am I correct?

Philipp's point, in my interpretation, was about something else, whichis to focus on whether ME bias results in relation to the very specificstudy question(s). If random, he argued, I could tolerate a very largeME and potentially even an ME that is larger than variation amongindividuals in a sample. For instance, he says that, if ME is random, itwill end up in the last PCs and, if I use only the first ones, resultswill be unaffected.Even if I see the point from a statistical perspective, from abiological one, I worry about analyzing data that are mostly noise. Ifthat's the only option, I might do it with due acknowledgement of theproblem and big caveats: can I really be sure that such a huge error isnot affecting results in subtle ways? Is the error in shape reallyrandom after the superimposition? For the within a configurationanalyses of modularity/integration certainly it is not, as I showed withmy simulations of isotropic variance: the bias will always be there,sometimes negligible and sometimes not (who knows?). Even with a simplePCA when p/N is huge, especially with slid semilandmarks, I get apattern (dominant PC1) from isotropic noise.

Again assuming I am correct, Philipp suggests to look for structure inthe error component (the differences between replicates) to understandif it may bias results for the specific question I am asking. As I said,I really like this idea but would like to see a variety of examples inpublications and I am not sure how easy it is to do this objectively(not purely based on judgement) and exhaustively. I must be able todetect the effect of a potential bias on all the questions I aminvestigating (it may many) as well as all the assumptions of the modelsin the analyses. I think Philipp mentioned path analysis as an option,which is something I'll have to learn more about.

I feel more confident, for now, with Philipp's other point, which is thebasic rationale of my approach:"one can argue that if measurement error is very small, then randomnessand homogeneity across groups are less of an issue. But in this case theerror really needs to be negligibly small, not just smaller than theindividual variation"This is why I tend to use a fairly conservative approach and I am nothappy with relying on a P value. The R2 of the effect I am interestedmust be much larger than that of ME and replicates should clustertightly around each individual. If the design used for collecting thereplicates is accurate, that should take care of the bias. If not, it istricky because I could have an apparently small ME with an important bias.The example I mentioned of strong group structure in data collected intwo chunks shows what happens with a flawed design: the colleague whomeasured the skull was confident that ME was negligible because hetested it in the first chunk of data collection; however, the way toassess ME correctly would have been having at least a subsample of thesame individuals measured in both chunks of data collection. That wouldhave included the effect of the long time between the data collectionsas well as the different Microscribe used for landmarking. I bet resultswould have been different (much larger ME relative to sample variance)from the original assessment of ME in the first data collection. With awell design set of replicates, in fact, we could have estimated thedifferent sources of error, found where the problem was and maybe trieda correction. Unfortunately, he had replicates only for the first dataset.The way I found the problem is the same philosophy as in Philipp'smessage and, with limitations, that may sometimes work even withoutreplicates: there was a pattern in the data (clear group separation onPC1 in relation to the time of data collection) and the mostparsimonious explanation was ME. In the end, the bias was obvious anddata could not be used.


Nice discussion.
Cheers

Andrea





On 04/11/2022 18:59, Mike Collyer wrote:

I agree with Philipp’s main point that it can be dangerous to quantify 
measurement error as a value based on (likely a ratio including) the variation 
among individuals on which the variation between repeated digitizations is also 
made, if it is not clear how variable those individuals are.  I was seeking 
some examples to demonstrate that small measurement error can look larger when 
individuals are not that different in shape or large measurement error can look 
small if they are.  I was not very successful before Philipp responded.  
However, I did play with the “mosquito” data set in geomorph, which led me in a 
different direction.  I chose this data set because it contains two replicate 
configurations for each individual.

For context, here is the analysis I considered:

library(geomorph)
data("mosquito")

# use just one side for demonstration
# resdual SS can be considered basis for measurement error

lmks <- mosquito$wingshape[,, which(mosquito$side == 1)]
ind <- mosquito$ind[ which(mosquito$side == 1)]
GPA <- gpagen(lmks, print.progress = FALSE)
summary(procD.lm(coords ~ ind, data = GPA))


Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals
Number of permutations: 1000
Estimation method: Ordinary Least Squares
Sums of Squares and Cross-products: Type I
Effect sizes (Z) based on F distributions

           Df       SS        MS     Rsq      F      Z Pr(>F)
ind        9 0.069286 0.0076984 0.62764 1.8729 2.6261  0.006 **
Residuals 10 0.041105 0.0041105 0.37236
Total     19 0.110390
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = coords ~ ind, data = GPA)

It might be alarming that the residual Rsq is 0.37236, which is the portion of 
variation attributed to multiple measurements on the same individuals.  That 
might seem high.  I grew quickly tired of searching for a similar data set with 
contrasting results and decided that maybe I could just simulate measurement 
error and ask if the residual SS here was large compared to what I simulated.  
I thought about this as a process and came to the conclusion that one could 
simulate landmark wobble (like a shaky hand) by making the standard deviation 
of wobble sampled from a normal distribution proportional to a fraction of the 
centroid size.  For example, a 5% error could mean that the standard deviation 
for the distribution from which a random value is sampled (x, y, or z 
coordinate) is 0.05 * CS for that configuration.  (The shakiness scales with 
the size of the object.

I ended up making a function that could simulate a measurement error outcome.  
Here is the function, in case anyone might find it useful (I have not tested 
this, so please expect clunkiness…).  One adds a set of coordinates (assumed to 
be a 3d array), the number of replicates to simulate (the observed counts as 
1), and the percentage of centroid size to use to vary the sd of a random 
sample from a normal distribution.  It performs ANOVA for the simulated data 
(following GPA).


makeME <- function(coords, reps = 2, per.error = 0.05){ # per.error means sd = 
per.error * Csize
   if(reps < 2)
     stop("Must have more than 1 replicate to run this.\n")
   dims <- dim(coords)
   n <- dims[3]
   p <- dims[1]
   k <- dims[2]

   nms <- dimnames(coords)[[3]]
   if(is.null(nms)) nms <- paste("spec", 1:n, sep = "")
   Coords <- lapply(1:n, function(x) as.matrix(coords[,, x]))
   nnms <- paste(rep(nms, each = reps), 1:reps, sep = ".rep")

newCoords <- rep(Coords, each = reps)

   names(newCoords) <- nnms
   initGPA <- gpagen(coords, print.progress = FALSE, max.iter = 1)
   Csize <- rep(initGPA$Csize, each = reps)
   err <- rep(c(0, rep(per.error, reps - 1)),  n)
   for(i in 1:length(err)) newCoords[[i]] <- newCoords[[i]] + rnorm(p * k, sd = 
err[i] * Csize [i])
   newCoords <- simplify2array(newCoords)

GPA <- gpagen(newCoords, print.progress = FALSE)ind <- factor(rep(1:n, each = reps))

   return(summary(procD.lm(coords ~ ind, data = GPA)))
}

And as an example application, using the same data as above:

makeME(mosquito$wingshape[,, which(mosquito$side == 1)])


Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals
Number of permutations: 1000
Estimation method: Ordinary Least Squares
Sums of Squares and Cross-products: Type I
Effect sizes (Z) based on F distributions

           Df      SS       MS     Rsq      F      Z Pr(>F)
ind       19 0.91707 0.048267 0.56455 1.3647 3.2186  0.002 **
Residuals 20 0.70736 0.035368 0.43545
Total     39 1.62442
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = coords ~ ind, data = GPA)

So I might conclude from this that if I allowed my digitizing to vary by 5% of 
centroid size, it appears my observed digitization has a measurement error less 
than that, which might help me to feel confident.  In case I worry that this 
one random outcome is not fully representative, the following function allows 
me to run many simulations (100 as an example)


simulate.makeME <- function(coords, reps = 2, per.error = 0.05, nsims = 100) {
   result <- sapply(1:nsims, function(j) {
     cat("sim:", j, "... ")
     res <- makeME(coords, reps, per.error)
     res$table$Rsq[2]}
     )
   cat("\n\n")
   names(result) <- paste("sim", 1:nsims, sep = ".")
   result
}

ME.sims <- simulate.makeME (mosquito$wingshape[,, which(mosquito$side == 1)], 
reps = 2, per.error = 0.05, nsims = 100)
summary(ME.sims) # just Rsq

    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.4264  0.4423  0.4474  0.4476  0.4533  0.4729

So now I feel really confident that measurement error is probably not a worry, 
based on results from a process that imposes a certain level of measurement 
error.

I might also start to wonder when imposing the randomness starts to approach 
what I see in my empirical example.

makeME(mosquito$wingshape[,, which(mosquito$side == 1)], per.error = 0.03)


Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals
Number of permutations: 1000
Estimation method: Ordinary Least Squares
Sums of Squares and Cross-products: Type I
Effect sizes (Z) based on F distributions

           Df      SS       MS     Rsq      F      Z Pr(>F)
ind       19 0.49153 0.025870 0.62935 1.7873 5.7972  0.001 **
Residuals 20 0.28948 0.014474 0.37065
Total     39 0.78101
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = coords ~ ind, data = GPA)

These results mimic my observed empirical results pretty well.  Maybe I can 
infer from this that my digitizing could off by as much as 3% and produce 
results like I observed?

This is a different way of approaching the problem than calculating and trying 
to make sense of statistic that might resemble an effect size, but it feels 
more informative to me.  I am not sure that it is smart to scale the amount of 
variation with centroid size — one might have large and small individuals but 
can zoom in or out to better capture landmark locations — so the function could 
be rewritten to not include centroid size as variable.  This was done so that 
the simulated error was made for digitized specimens, but could be done on 
configurations already constrained to be unit size (after GPA).  I am also not 
sure that it is smart to sample from a normal distribution.  Maybe sampling 
from a uniform distribution would better resemble digitizing shakiness.  I only 
wandered so far into the weeds with this.

I think this might qualify as an additional exploratory approach and agree with 
Philipp that making sense of the magnitude and directions between repeated 
measures, even if only viewed in a PC plot, is rather important.  I’m sure this 
could be improved if someone wants to play more with other data sets.

Cheers!
Mike

On Nov 4, 2022, at 10:38 AM, [email protected] <[email protected]> wrote:

Dear all,

I like to challenge this view on measurement error, as summarized by Andrea, a
bit more generally.

Clearly, measurement error should be "small," but I disagree that "the idea is that
differences among individuals (averaged replicates) in a representative sample should be larger than
differences between replicates of the same individual". First, the between-individual variance (or mean
sum of squares, MSS) depends on the choice of individuals. For instance, if the sample comprises different
species, the MSS between individuals is much larger than for a sample of a single species, and the error MSS
in relation to the individual MSS is much smaller in the multi-species sample. Hence, whether or not the
error MSS is larger than the between-individual MSS is somewhat arbitrary and of secondary importance anyway.
"Controlling for main effects," as suggested by Andrea, is possible but it removes the actual
signal against wich I may want to compare the error. In either case, the p-value of the MANOVA is
uninformative because the underlying H0 is irrelevant.

In my opinion, it is more important that the error is unrelated to the signal of interest
("random"), rather than that it is small in terms of some summary statistic. For
instance, if in a growth study the measurement error is uncorrelated with the age effects, the
error "averages out" (if sample size is large enough) and does not bias the average
growth trajectory, even if the error is large. The same applies to group differences. MANOVA does
not inform about this independence. Moreover, it pools over all shape coordinates. For instance, it
does not inform us if the error is large for shape features of interest (those that differ between
groups or correlate with age, etc.) or for shape features of less interest.

Note also that most morphometric analyses are based on a few principal
components (or similar statistics) of the shape coordinates. PCs are linear
combinations, i.e., weighted averages, of the shape coordinates. Hence, group
means in a PC plot are averages over all cases AND all variables, so that
random error can be expected to be small. Anther issue to consider: If
measurement error is indeed approximately isotropic, it has a similar magnitude
for all shape features (all directions of shape space). The individual
variance, however, typically is much greater for large-scale shape features
than for small-scale features, and the relative magnitude of measurement error
decreases with increasing spatial scale. PCs typically capture large-scale
shape variation, where the relative error is expected to be smaller. The same
applies to the symmetric vs. asymmetric components, the latter of which has
much smaller individual variance and hence greater relative measurement error.

The situation is slightly different in studies that compare shape variances,
not means, between groups, between symmetric and asymmetric components, or
among spatial scales. In contrast to mean estimates, measurement error does not
average out for these variance estimates. It is thus important that magnitude
and pattern of measurement error are constant (not necessarily small) across
groups or components so that observed differences in variance are attributable
to biological factors rather than systematic differences in measurement error.
Measurement error is most challenging when comparing entire variance-covariance
matrices. But again, MANOVA is not the way to assess homogeneity of measurement
error across groups.

If the sample is properly randomized before measurement, it is reasonable to
assume that measurement error is approximately uncorrelated with the signal of
interest. But there can be exceptions. For instance, younger and smaller
individuals can be harder to measure than older and larger individuals.
Measurement error can thus correlate with age. I discussed this in Mitteroecker
P, Stansfield E (2021) A model of developmental canalization, applied to human
cranial form. PLOS Computational Biology 17 (2): e1008381

Clearly, one can argue that if measurement error is very small, then randomness
and homogeneity across groups are less of an issue. But in this case the error
really needs to be negligibly small, not just smaller than the individual
variation.

Instead of somewhat meaningless scalar summary statistics (like the F-ratio or
some multivariate version of it), I thus prefer an exploratory approach. In the
simplest case, a PCA of the data, including the replicated specimens, can show
the magnitude and directionality of measurement error in relation to the signal
of interest (e.g., group differences, growth trajectories). Measurement error
can also be correlated with external variables (e.g., age) or compared among
groups, but to my knowledge little work has been done in this direction in
geometric morphometrics. An alternative are errors-in-variables models and
structural equation models that implement estimates of measurement error in the
first place.

Best,

Philipp M.

[email protected] <http://gmail.com/> schrieb am Donnerstag, 3. November 2022
um 16:36:21 UTC+1:

Dear All,
beside the excellent review by Carmelo, I suggest a few other papers
on ME in geometric morphometrics:
Arnqvist, G., Martensson, T. Measurement error in geometric
morphometrics: empirical strategies to assess and reduce its impact on
measures of shape. Acta Zoologica Academiae Scientiarum Hungaricae,
1998, 44: 73–96. (A bit outdated but still wonderfully accurate in how
they explain different sources of ME).
Klingenberg, C.P., Barluenga, M., Meyer, A. Shape Analysis of
Symmetric Structures: Quantifying Variation Among Individuals and
Asymmetry. Evolution, 2002, 56: 1909–1920. (From where most of us have
borrowed the protocol for assessing ME).
Viscosi, V., Cardini, A. Leaf Morphology, Taxonomy and Geometric
Morphometrics: A Simplified Protocol for Beginners. PLoS ONE, 2011, 6:
e25630.
Galimberti, F., Sanvito, S., Vinesi, M.C., Cardini, A. “Nose-metrics”
of wild southern elephant seal (Mirounga leonina) males using image
analysis and geometric morphometrics. Journal of Zoological
Systematics and Evolutionary Research, 2019, 57: 710–720.

There's also another one I like, by the Viennese morphometricians (in
a paper on human mandibles, or teeth, symmetric and asymmetric
variation, if I remember well), but I can't find it now.


In general, the idea is that differences among individuals (averaged
replicates) in a representative sample should be larger than
differences between replicates of the same individual (the estimate of
ME). This is what is tested by 'individual' in the Procrustes ANOVA in
MorphoJ. It might be important to control for main effects in the
analysis. For instance, by including species and sex before individual
in the hierarchical analysis, I 'statistically remove' (with some
assumptions) the average effect of these factors before comparing
individual variation to ME, which makes the test more conservative (NB
whether this is OK or not it depends on the question one is asking in
her/his study).
For shape data, even if the P value of individual vs residual is
significant, I would not conclude that ME is negligible for sure. I'd
check that the individual Rsq is much larger than the ME (residual)
Rsq and also that shape distances between replicates of the same
individual are smaller than distances among different individuals (if
this is true, replicates should cluster 'within individual' in a UPGMA
phenogram). Then, I feel a bit more confident that ME might be
negligible.

If ME is large, it may happen that its Rsq is larger than the
individual Rsq (or, which is the same ME SSQ > individual SSQ). For
the F ratio, however, one should look at the mean SSQ, which take df
into account. From the MSSQ, one computes F.
The F ratio in MorphoJ employs an isotropic model but, with large
samples (relative to the number of variables), the software also
provides P values using Pillai, that does not depend (if I recall
well!) on an isotropic model. That N is large and the sample
representative is crucial if one is using a subsample in the
assessment of ME to avoid replicate measurements of all individuals,
which would be better but might take too long if one has hundreds or
thousands individuals.
In R, I generally use adonis that employs an F test (same as in
MorphoJ, for a simple design) but uses permutations instead of
parametric tests. The use of permutations was also suggested as
desirable in Klingenberg et al., 2002. Other packages I suspect might
do something similar, although maybe using different permutational
approaches. I am sure it is explained in their help files.

Cheers

Andrea

On 03/11/2022, ying yi <[email protected] <>> wrote:

Dear all,
I used the “procD.lm” function in the geomorph package to test the
measurement error. I was surprised to find that the within-groups ANOVA sum

of squares I got was greater than the among-groups ANOVA sum of squares. I

wonder if something went wrong. What does it mean for “procD.lm” function
to get an F value <1?
I would be very happy if someone could help me.
Yours,
Sam

References are as follows:

--
You received this message because you are subscribed to the Google Groups
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected] <>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/morphmet2/06065841-c42e-4a58-a5d3-a96eb3c5787dn%40googlegroups.com.



--
E-mail address: [email protected] <>, [email protected] <>
WEBPAGE: https://sites.google.com/view/alcardini2/
or https://tinyurl.com/andreacardini



--
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 
[email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/9f7a7818-f6c2-446c-aec8-f66f5f2c730cn%40googlegroups.com
 
<https://groups.google.com/d/msgid/morphmet2/9f7a7818-f6c2-446c-aec8-f66f5f2c730cn%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
Dr. Andrea Cardini

Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università diModena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy

tel. 0039 059 4223140

Adjunct Associate Professor, Centre for Forensic Anthropology, TheUniversity of Western Australia, 35 Stirling Highway, Crawley WA 6009,Australia


E-mail address: [email protected], [email protected]
WEBPAGE: https://sites.google.com/view/alcardini2/
or https://tinyurl.com/andreacardini

--
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/0b88bb60-18a1-325b-9234-7bca5013c09f%40gmail.com.

Re: [MORPHMET2] Measurement error in geometric morphometrics

Reply via email to