Hi Mike,
you wrote something I am sure I misunderstood:
"small measurement error can look larger when individuals are not that different in shape or large measurement error can look small if they are" ME is relative and scaling it to the variation in the sample is precisely what we need to do: if 'true' variation is large (say, in a macroevolutionary analysis with many different species and genera), a certain amount of ME might be negligible; if 'true' variation is small (with the same configuration, I may be studying small geographic variation among closely related populations), that same amount of ME could be too much. That will be the same for any type of measurement (CS, shape, traditional morphometrics).

Your simulation seems to aim at relating the relative estimate of ME I get from the Procrustes ANOVA (the two R2s) to absolute variation due to digitization error on the original specimens. Am I correct?


Philipp's point, in my interpretation, was about something else, which is to focus on whether ME bias results in relation to the very specific study question(s). If random, he argued, I could tolerate a very large ME and potentially even an ME that is larger than variation among individuals in a sample. For instance, he says that, if ME is random, it will end up in the last PCs and, if I use only the first ones, results will be unaffected. Even if I see the point from a statistical perspective, from a biological one, I worry about analyzing data that are mostly noise. If that's the only option, I might do it with due acknowledgement of the problem and big caveats: can I really be sure that such a huge error is not affecting results in subtle ways? Is the error in shape really random after the superimposition? For the within a configuration analyses of modularity/integration certainly it is not, as I showed with my simulations of isotropic variance: the bias will always be there, sometimes negligible and sometimes not (who knows?). Even with a simple PCA when p/N is huge, especially with slid semilandmarks, I get a pattern (dominant PC1) from isotropic noise.

Again assuming I am correct, Philipp suggests to look for structure in the error component (the differences between replicates) to understand if it may bias results for the specific question I am asking. As I said, I really like this idea but would like to see a variety of examples in publications and I am not sure how easy it is to do this objectively (not purely based on judgement) and exhaustively. I must be able to detect the effect of a potential bias on all the questions I am investigating (it may many) as well as all the assumptions of the models in the analyses. I think Philipp mentioned path analysis as an option, which is something I'll have to learn more about.


I feel more confident, for now, with Philipp's other point, which is the basic rationale of my approach: "one can argue that if measurement error is very small, then randomness and homogeneity across groups are less of an issue. But in this case the error really needs to be negligibly small, not just smaller than the individual variation" This is why I tend to use a fairly conservative approach and I am not happy with relying on a P value. The R2 of the effect I am interested must be much larger than that of ME and replicates should cluster tightly around each individual. If the design used for collecting the replicates is accurate, that should take care of the bias. If not, it is tricky because I could have an apparently small ME with an important bias. The example I mentioned of strong group structure in data collected in two chunks shows what happens with a flawed design: the colleague who measured the skull was confident that ME was negligible because he tested it in the first chunk of data collection; however, the way to assess ME correctly would have been having at least a subsample of the same individuals measured in both chunks of data collection. That would have included the effect of the long time between the data collections as well as the different Microscribe used for landmarking. I bet results would have been different (much larger ME relative to sample variance) from the original assessment of ME in the first data collection. With a well design set of replicates, in fact, we could have estimated the different sources of error, found where the problem was and maybe tried a correction. Unfortunately, he had replicates only for the first dataset. The way I found the problem is the same philosophy as in Philipp's message and, with limitations, that may sometimes work even without replicates: there was a pattern in the data (clear group separation on PC1 in relation to the time of data collection) and the most parsimonious explanation was ME. In the end, the bias was obvious and data could not be used.

Nice discussion.
Cheers

Andrea





On 04/11/2022 18:59, Mike Collyer wrote:
I agree with Philipp’s main point that it can be dangerous to quantify 
measurement error as a value based on (likely a ratio including) the variation 
among individuals on which the variation between repeated digitizations is also 
made, if it is not clear how variable those individuals are.  I was seeking 
some examples to demonstrate that small measurement error can look larger when 
individuals are not that different in shape or large measurement error can look 
small if they are.  I was not very successful before Philipp responded.  
However, I did play with the “mosquito” data set in geomorph, which led me in a 
different direction.  I chose this data set because it contains two replicate 
configurations for each individual.

For context, here is the analysis I considered:

library(geomorph)
data("mosquito")

# use just one side for demonstration
# resdual SS can be considered basis for measurement error

lmks <- mosquito$wingshape[,, which(mosquito$side == 1)]
ind <- mosquito$ind[ which(mosquito$side == 1)]
GPA <- gpagen(lmks, print.progress = FALSE)
summary(procD.lm(coords ~ ind, data = GPA))

Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals
Number of permutations: 1000
Estimation method: Ordinary Least Squares
Sums of Squares and Cross-products: Type I
Effect sizes (Z) based on F distributions

           Df       SS        MS     Rsq      F      Z Pr(>F)
ind        9 0.069286 0.0076984 0.62764 1.8729 2.6261  0.006 **
Residuals 10 0.041105 0.0041105 0.37236
Total     19 0.110390
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = coords ~ ind, data = GPA)

It might be alarming that the residual Rsq is 0.37236, which is the portion of 
variation attributed to multiple measurements on the same individuals.  That 
might seem high.  I grew quickly tired of searching for a similar data set with 
contrasting results and decided that maybe I could just simulate measurement 
error and ask if the residual SS here was large compared to what I simulated.  
I thought about this as a process and came to the conclusion that one could 
simulate landmark wobble (like a shaky hand) by making the standard deviation 
of wobble sampled from a normal distribution proportional to a fraction of the 
centroid size.  For example, a 5% error could mean that the standard deviation 
for the distribution from which a random value is sampled (x, y, or z 
coordinate) is 0.05 * CS for that configuration.  (The shakiness scales with 
the size of the object.

I ended up making a function that could simulate a measurement error outcome.  
Here is the function, in case anyone might find it useful (I have not tested 
this, so please expect clunkiness…).  One adds a set of coordinates (assumed to 
be a 3d array), the number of replicates to simulate (the observed counts as 
1), and the percentage of centroid size to use to vary the sd of a random 
sample from a normal distribution.  It performs ANOVA for the simulated data 
(following GPA).


makeME <- function(coords, reps = 2, per.error = 0.05){ # per.error means sd = 
per.error * Csize
   if(reps < 2)
     stop("Must have more than 1 replicate to run this.\n")
   dims <- dim(coords)
   n <- dims[3]
   p <- dims[1]
   k <- dims[2]

   nms <- dimnames(coords)[[3]]
   if(is.null(nms)) nms <- paste("spec", 1:n, sep = "")
   Coords <- lapply(1:n, function(x) as.matrix(coords[,, x]))
   nnms <- paste(rep(nms, each = reps), 1:reps, sep = ".rep")
newCoords <- rep(Coords, each = reps)
   names(newCoords) <- nnms
   initGPA <- gpagen(coords, print.progress = FALSE, max.iter = 1)
   Csize <- rep(initGPA$Csize, each = reps)
   err <- rep(c(0, rep(per.error, reps - 1)),  n)
   for(i in 1:length(err)) newCoords[[i]] <- newCoords[[i]] + rnorm(p * k, sd = 
err[i] * Csize [i])
   newCoords <- simplify2array(newCoords)
GPA <- gpagen(newCoords, print.progress = FALSE) ind <- factor(rep(1:n, each = reps))
   return(summary(procD.lm(coords ~ ind, data = GPA)))
}

And as an example application, using the same data as above:

makeME(mosquito$wingshape[,, which(mosquito$side == 1)])

Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals
Number of permutations: 1000
Estimation method: Ordinary Least Squares
Sums of Squares and Cross-products: Type I
Effect sizes (Z) based on F distributions

           Df      SS       MS     Rsq      F      Z Pr(>F)
ind       19 0.91707 0.048267 0.56455 1.3647 3.2186  0.002 **
Residuals 20 0.70736 0.035368 0.43545
Total     39 1.62442
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = coords ~ ind, data = GPA)

So I might conclude from this that if I allowed my digitizing to vary by 5% of 
centroid size, it appears my observed digitization has a measurement error less 
than that, which might help me to feel confident.  In case I worry that this 
one random outcome is not fully representative, the following function allows 
me to run many simulations (100 as an example)


simulate.makeME <- function(coords, reps = 2, per.error = 0.05, nsims = 100) {
   result <- sapply(1:nsims, function(j) {
     cat("sim:", j, "... ")
     res <- makeME(coords, reps, per.error)
     res$table$Rsq[2]}
     )
   cat("\n\n")
   names(result) <- paste("sim", 1:nsims, sep = ".")
   result
}

ME.sims <- simulate.makeME (mosquito$wingshape[,, which(mosquito$side == 1)], 
reps = 2, per.error = 0.05, nsims = 100)
summary(ME.sims) # just Rsq
    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
  0.4264  0.4423  0.4474  0.4476  0.4533  0.4729

So now I feel really confident that measurement error is probably not a worry, 
based on results from a process that imposes a certain level of measurement 
error.

I might also start to wonder when imposing the randomness starts to approach 
what I see in my empirical example.

makeME(mosquito$wingshape[,, which(mosquito$side == 1)], per.error = 0.03)

Analysis of Variance, using Residual Randomization
Permutation procedure: Randomization of null model residuals
Number of permutations: 1000
Estimation method: Ordinary Least Squares
Sums of Squares and Cross-products: Type I
Effect sizes (Z) based on F distributions

           Df      SS       MS     Rsq      F      Z Pr(>F)
ind       19 0.49153 0.025870 0.62935 1.7873 5.7972  0.001 **
Residuals 20 0.28948 0.014474 0.37065
Total     39 0.78101
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Call: procD.lm(f1 = coords ~ ind, data = GPA)

These results mimic my observed empirical results pretty well.  Maybe I can 
infer from this that my digitizing could off by as much as 3% and produce 
results like I observed?

This is a different way of approaching the problem than calculating and trying 
to make sense of statistic that might resemble an effect size, but it feels 
more informative to me.  I am not sure that it is smart to scale the amount of 
variation with centroid size — one might have large and small individuals but 
can zoom in or out to better capture landmark locations — so the function could 
be rewritten to not include centroid size as variable.  This was done so that 
the simulated error was made for digitized specimens, but could be done on 
configurations already constrained to be unit size (after GPA).  I am also not 
sure that it is smart to sample from a normal distribution.  Maybe sampling 
from a uniform distribution would better resemble digitizing shakiness.  I only 
wandered so far into the weeds with this.

I think this might qualify as an additional exploratory approach and agree with 
Philipp that making sense of the magnitude and directions between repeated 
measures, even if only viewed in a PC plot, is rather important.  I’m sure this 
could be improved if someone wants to play more with other data sets.

Cheers!
Mike

On Nov 4, 2022, at 10:38 AM, [email protected] <[email protected]> wrote:

Dear all,

I like to challenge this view on measurement error, as summarized by Andrea, a 
bit more generally.

Clearly, measurement error should be "small," but I disagree that "the idea is that 
differences among individuals (averaged replicates) in a representative sample should be larger than 
differences between replicates of the same individual". First, the between-individual variance (or mean 
sum of squares, MSS) depends on the choice of individuals. For instance, if the sample comprises different 
species, the MSS between individuals is much larger than for a sample of a single species, and the error MSS 
in relation to the individual MSS is much smaller in the multi-species sample. Hence, whether or not the 
error MSS is larger than the between-individual MSS is somewhat arbitrary and of secondary importance anyway. 
"Controlling for main effects," as suggested by Andrea, is possible but it removes the actual 
signal against wich I may want to compare the error. In either case, the p-value of the MANOVA is 
uninformative because the underlying H0 is irrelevant.

In my opinion, it is more important that the error is unrelated to the signal of interest 
("random"), rather than that it is small in terms of some summary statistic. For 
instance, if in a growth study the measurement error is uncorrelated with the age effects, the 
error "averages out" (if sample size is large enough) and does not bias the average 
growth trajectory, even if the error is large. The same applies to group differences. MANOVA does 
not inform about this independence. Moreover, it pools over all shape coordinates. For instance, it 
does not inform us if the error is large for shape features of interest (those that differ between 
groups or correlate with age, etc.) or for shape features of less interest.

Note also that most morphometric analyses are based on a few principal 
components (or similar statistics) of the shape coordinates. PCs are linear 
combinations, i.e., weighted averages, of the shape coordinates. Hence, group 
means in a PC plot are averages over all cases AND all variables, so that 
random error can be expected to be small. Anther issue to consider: If 
measurement error is indeed approximately isotropic, it has a similar magnitude 
for all shape features (all directions of shape space). The individual 
variance, however, typically is much greater for large-scale shape features 
than for small-scale features, and the relative magnitude of measurement error 
decreases with increasing spatial scale. PCs typically capture large-scale 
shape variation, where the relative error is expected to be smaller. The same 
applies to the symmetric vs. asymmetric components, the latter of which has 
much smaller individual variance and hence greater relative measurement error.

The situation is slightly different in studies that compare shape variances, 
not means, between groups, between symmetric and asymmetric components, or 
among spatial scales. In contrast to mean estimates, measurement error does not 
average out for these variance estimates. It is thus important that magnitude 
and pattern of measurement error are constant (not necessarily small) across 
groups or components so that observed differences in variance are attributable 
to biological factors rather than systematic differences in measurement error. 
Measurement error is most challenging when comparing entire variance-covariance 
matrices. But again, MANOVA is not the way to assess homogeneity of measurement 
error across groups.

If the sample is properly randomized before measurement, it is reasonable to 
assume that measurement error is approximately uncorrelated with the signal of 
interest. But there can be exceptions. For instance, younger and smaller 
individuals can be harder to measure than older and larger individuals. 
Measurement error can thus correlate with age. I discussed this in Mitteroecker 
P, Stansfield E (2021) A model of developmental canalization, applied to human 
cranial form. PLOS Computational Biology 17 (2): e1008381

Clearly, one can argue that if measurement error is very small, then randomness 
and homogeneity across groups are less of an issue. But in this case the error 
really needs to be negligibly small, not just smaller than the individual 
variation.

Instead of somewhat meaningless scalar summary statistics (like the F-ratio or 
some multivariate version of it), I thus prefer an exploratory approach. In the 
simplest case, a PCA of the data, including the replicated specimens, can show 
the magnitude and directionality of measurement error in relation to the signal 
of interest (e.g., group differences, growth trajectories). Measurement error 
can also be correlated with external variables (e.g., age) or compared among 
groups, but to my knowledge little work has been done in this direction in 
geometric morphometrics. An alternative are errors-in-variables models and 
structural equation models that implement estimates of measurement error in the 
first place.

Best,

Philipp M.





[email protected] <http://gmail.com/> schrieb am Donnerstag, 3. November 2022 
um 16:36:21 UTC+1:
Dear All,
beside the excellent review by Carmelo, I suggest a few other papers
on ME in geometric morphometrics:
Arnqvist, G., Martensson, T. Measurement error in geometric
morphometrics: empirical strategies to assess and reduce its impact on
measures of shape. Acta Zoologica Academiae Scientiarum Hungaricae,
1998, 44: 73–96. (A bit outdated but still wonderfully accurate in how
they explain different sources of ME).
Klingenberg, C.P., Barluenga, M., Meyer, A. Shape Analysis of
Symmetric Structures: Quantifying Variation Among Individuals and
Asymmetry. Evolution, 2002, 56: 1909–1920. (From where most of us have
borrowed the protocol for assessing ME).
Viscosi, V., Cardini, A. Leaf Morphology, Taxonomy and Geometric
Morphometrics: A Simplified Protocol for Beginners. PLoS ONE, 2011, 6:
e25630.
Galimberti, F., Sanvito, S., Vinesi, M.C., Cardini, A. “Nose-metrics”
of wild southern elephant seal (Mirounga leonina) males using image
analysis and geometric morphometrics. Journal of Zoological
Systematics and Evolutionary Research, 2019, 57: 710–720.

There's also another one I like, by the Viennese morphometricians (in
a paper on human mandibles, or teeth, symmetric and asymmetric
variation, if I remember well), but I can't find it now.


In general, the idea is that differences among individuals (averaged
replicates) in a representative sample should be larger than
differences between replicates of the same individual (the estimate of
ME). This is what is tested by 'individual' in the Procrustes ANOVA in
MorphoJ. It might be important to control for main effects in the
analysis. For instance, by including species and sex before individual
in the hierarchical analysis, I 'statistically remove' (with some
assumptions) the average effect of these factors before comparing
individual variation to ME, which makes the test more conservative (NB
whether this is OK or not it depends on the question one is asking in
her/his study).
For shape data, even if the P value of individual vs residual is
significant, I would not conclude that ME is negligible for sure. I'd
check that the individual Rsq is much larger than the ME (residual)
Rsq and also that shape distances between replicates of the same
individual are smaller than distances among different individuals (if
this is true, replicates should cluster 'within individual' in a UPGMA
phenogram). Then, I feel a bit more confident that ME might be
negligible.

If ME is large, it may happen that its Rsq is larger than the
individual Rsq (or, which is the same ME SSQ > individual SSQ). For
the F ratio, however, one should look at the mean SSQ, which take df
into account. From the MSSQ, one computes F.
The F ratio in MorphoJ employs an isotropic model but, with large
samples (relative to the number of variables), the software also
provides P values using Pillai, that does not depend (if I recall
well!) on an isotropic model. That N is large and the sample
representative is crucial if one is using a subsample in the
assessment of ME to avoid replicate measurements of all individuals,
which would be better but might take too long if one has hundreds or
thousands individuals.
In R, I generally use adonis that employs an F test (same as in
MorphoJ, for a simple design) but uses permutations instead of
parametric tests. The use of permutations was also suggested as
desirable in Klingenberg et al., 2002. Other packages I suspect might
do something similar, although maybe using different permutational
approaches. I am sure it is explained in their help files.

Cheers

Andrea

On 03/11/2022, ying yi <[email protected] <>> wrote:
Dear all,
I used the “procD.lm” function in the geomorph package to test the
measurement error. I was surprised to find that the within-groups ANOVA sum

of squares I got was greater than the among-groups ANOVA sum of squares. I

wonder if something went wrong. What does it mean for “procD.lm” function
to get an F value <1?
I would be very happy if someone could help me.
Yours,
Sam

References are as follows:

--
You received this message because you are subscribed to the Google Groups
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to [email protected] <>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/morphmet2/06065841-c42e-4a58-a5d3-a96eb3c5787dn%40googlegroups.com.



--
E-mail address: [email protected] <>, [email protected] <>
WEBPAGE: https://sites.google.com/view/alcardini2/
or https://tinyurl.com/andreacardini


--
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email to 
[email protected] 
<mailto:[email protected]>.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/9f7a7818-f6c2-446c-aec8-f66f5f2c730cn%40googlegroups.com
 
<https://groups.google.com/d/msgid/morphmet2/9f7a7818-f6c2-446c-aec8-f66f5f2c730cn%40googlegroups.com?utm_medium=email&utm_source=footer>.


--
Dr. Andrea Cardini
Researcher, Dipartimento di Scienze Chimiche e Geologiche, Università di Modena e Reggio Emilia, Via Campi, 103 - 41125 Modena - Italy
tel. 0039 059 4223140

Adjunct Associate Professor, Centre for Forensic Anthropology, The University of Western Australia, 35 Stirling Highway, Crawley WA 6009, Australia

E-mail address: [email protected], [email protected]
WEBPAGE: https://sites.google.com/view/alcardini2/
or https://tinyurl.com/andreacardini

--
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/0b88bb60-18a1-325b-9234-7bca5013c09f%40gmail.com.

Reply via email to