Dear Morphmetricians,
I want to compare statistically if two vectors of PC1 loadings differ in
terms of their direction which can be measured by the angle between them.
Over the years similar questions appear over morphomet (e.g. thread1
<https://www.mail-archive.com/[email protected]/msg02270.html>,
thread2
<https://www.mail-archive.com/[email protected]/msg02148.html>,
...) but they are mostly related to geometric morphometrics approaches
(e.g. approach used in MorphoJ, part of PTA in geomorph, or customized
approaches provided in papers outlined in this Dean Adams' post
<https://www.mail-archive.com/[email protected]/msg02276.html>).
It might be important to mention that MorphoJ and geomorph treat the null
hypothesis (H0) differently when comparing angles (e.g. see the post of
<https://www.researchgate.net/post/Does-anyone-have-experience-with-vector-analysis-and-3D-data-in-MorphoJ>Miriam
L. Zelditch on RG
<https://www.researchgate.net/post/Does-anyone-have-experience-with-vector-analysis-and-3D-data-in-MorphoJ>),
so one should be aware of potential differences in results even though the
same data have been used.
In this post, I would like to compare if males and females differ in their
(multivaraite) size which is defined as a vector of PC1 loadings obtained
from decomposition of covariance matrix (not correlation) of log
transformed linear distances (Klingenberg, 1996
<https://morphometrics.uk/PDF_files/WhiteBook.pdf>). Such PC1 vector of
logged variables corresponds to a growth-axis, recording both isometric and
allometric variations. If groups (e.g. sexes) share the same overall size -
PC1 loadings of sexes have similar direction statistically speaking - it is
reasonable to treat PC2-PCn as descriptors of overall shape (e.g. they
correspond to variation not explained by general growth) of animals and
"analytical" size-correction (e.g. PCA shearing method) is possible, yet
this approach has limitations (see McCoy et al., 2006
<https://www.researchgate.net/publication/7176310_Size_correction_Comparing_morphological_traits_among_populations_and_environments>
among
others).
If I understood correctly, two questions (or assumptions) are important
here:
1) Is a vector of PC1 loadings for each sex isometric (in other words
colinear?)
2) PC1 loadings between sexes are pointing in same direction as otherwise
"size-correction" will be biased just because groups do not share common
size axis.
*The first assumption* is explained and coded in the book "Morphometrics
with R"
<https://www.researchgate.net/publication/258885152_Morphometrics_With_R>
written
by Julien Claude (pages 103-104)
<https://www.researchgate.net/publication/350439729_Rfunctions1_R_functions_for_geometric_morphometrics_developed_in_Claude_J_2008_Morphometrics_with_R?_sg%5B0%5D=R6_3W-64FZ3h43VzvFZh02awzYTBePhLB2hxLM3Tpoqeb0H2D5-xl4ZgiW5kn_xyQX7vqs_v4qrB67eqEXAOF9LseEVn3jo8NDHNJ0D0.NRfonsjUFiXnahK4ajI4NPb-t5OufVpHlO8TyNgfZo2v1f5dXnNe-hpILpjNZFCtMIgVaPq0wY7TsjSU9RAdmw>.
Function name is "coli" and according to text from the book it evaluates a
possible statistic for isometry. More specifically, the function tests if
sex-specific PC1 loadings differ from theoretical isometric vector (*i*)
which has the same length (number of elements, *p*) as the length of PC1
loadings but all values in *i* are identical to 1/sqrt (p). I will treat
this as a "Test 1".
*The second assumption *can be tested with permutations explained in the
work of Tzeng and Yeh (1999)
<https://zoolstud.sinica.edu.tw/Journals/38.1/10.pdf> which used
randomization in order to access statistical significance of observed angle
between two vectors. I will treat this test as as a "Test 2".
For the demonstration purpose and reproducibility I will use dataset of
three linear measurements on 48 turtles (24 males and 24 females) (Jolicoeur
and Mosimann, 1960 <http://pbil.univ-lyon1.fr/ADE-4/ade4-html/tortues.html>).
Codes for functions (coli and coli.perm) as well as downstream analyses
with reproducible results are in the "functions and example.txt" file
attached in this email.
-------------------------------------------------------------
run the code form the "functions and example.txt" file attached
or see results provided in the file.
-------------------------------------------------------------
*Questions:*
1. Is the H0 for Test 1 as follows: "Two vectors are not colinear (not
similar) no more than expected by chance (randomness)?" Likewise, if we
reject H0, we can say that two vectors are colinear and PC1 vector follows
theoretical isometry. Is this correct? What is puzzling me is the part
related to calculation of p value: it is "H0 < observed" but not "H0 >
observed"...
2. In accordance to Test 2, H0 is defined as: "The null hypothesis that we
want to test is that the 2 first eigenvecters are the same (pointing in the
same direction?)." If so, given that we rejected H0 in "test2", we can
conclude that males and females do not share the common size axis.
Therefore, if one would like to use PCA shearing method it would be
problematic here and other methods might be used instead each having it's
own limitation (CPCA, Burnaby-Back-Projection and so on...).
3. If we compare "test2" and "try" objects (PC1 males vs. PC1 females) we
came up with similar p values (test2: p = 0.0013 and try: p = 0.009) but
their conclusions are not the same. For instance, if we apply verbal
argumentation Test 2 says "PC1 vectors of sexes are not pointing in the
same direction" and test "try" says "PC1 vectors of sexes are colinear".
This sounds counter-intuitive... or not. I am kind of lost what are H0 for
those tests. Does this test "try" have any relations to the "test2" at all
in terms of conclusions and are they even comparable?
-------------------------------------------------------------
I appreciate any input.
Kind regards,
Marko
--
You received this message because you are subscribed to the Google Groups
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/morphmet2/9c418b9f-6a10-4f08-adb9-91453481e724n%40googlegroups.com.
## Functions
# Test 1
coli <- function (ev1, ev2, nperm = 1000, graph = TRUE) {
# coli function written by Julien Claude (2008)
# this one is updated version from Julien Claude RG
#
https://www.researchgate.net/publication/350439729_Rfunctions1_R_functions_for_geometric_morphometrics_developed_in_Claude_J_2008_Morphometrics_with_R
dist <- numeric (nperm)
n <- length (ev1)
Angle <- function (v1, v2) {
# angle is in degrees not radians
90 - abs ((180 * (acos (sum (v1 * v2) / (sqrt (sum (v1^2)) * sqrt
(sum (v2^2))))) / pi)
- 90)
}
for (i in 1:nperm) {# random vectors of size n generated from uniform
distribution
X1 <- runif (n, -1, 1)
X2 <- runif (n, -1, 1)
dist[i] <- Angle (X1, X2)
}
zobs <- Angle(ev1, ev2) # observed angle
# significance
pv <- length (dist[dist < zobs]) / nperm # IN THE BOOK originally it
was
# "length(dist[dist>zobs])/nperm" but current version is after errata
1.81
if (graph) {
hist (dist, breaks = 50,
main = "Distribution of angles between 2 random vectors", xlab
= "Z statistic",
ylab = "# of random vectors", sub = paste ("Actual z-obs =",round
(zobs, 5),":
p<", round (pv), 5))
abline (v = zobs)
}
list (angle = zobs, p = pv)
}
# Test 2
coli.perm <- function (data, factor, nperm = 10000, graph = TRUE) {
# coli.perm function written by Marko Djurakic
# function based on permutations according to Tzeng and Yeh (1999)
# 'data' argument should be numeric matrix where columns are variables and
rows are specimens
# in the case of linear measurements, variables should be log transformed
before the function call
#
# 'factor' argument is a factor of the length same as the number of rows in
the 'data'
#
# 'nperm' is the number of permutations (resampling without replacement)
# PCA is based on log transformed variables and their covariance matrix (not
correlation)
dist <- numeric (nperm + 1) # empty vector of length equal to number of
permutations + 1
n <- length (factor) # number of specimens
n.vars <- ncol (data) # number of raw variables (columns of the 'data')
if (nrow (data) != n) stop ("number of rows in 'data' should be the same as
the length of 'factor'")
groups <- split(data, factor) # split the 'data' according to levels of
'factor'
# function that calculates angle between 2 vectors
Angle <- function (v1, v2) {
# written by Julien Claude (2008)
# sum (v1 * v2) = dot (inner) product of vectors v1 and v2; same as v1 %*%
v2
# sqrt(sum(V1^2)) = norm of vector v1
# sqrt(sum(V2^2)) = norm of vector v2
# acos(x) = arc-cosine of x
# angle is in degrees!
90 - abs ((180 * (acos (sum (v1 * v2) / (sqrt (sum (v1^2)) * sqrt (sum
(v2^2))))) / pi) - 90)
}
# Step 1. Find the 1st eigenvectors for the 2 original
# data sets, and compute the angle between
# the 2 first eigenvectors, theta.
pc1.obs <- lapply (groups, function (x) {
prcomp(x[,1:n.vars], retx = TRUE, center = TRUE, scale. =
FALSE)$rotation[,"PC1"]
})
if (!identical (names (pc1.obs[[1]]), names (pc1.obs[[2]]))) stop ("PC1
loadings in two vectors are not in the same order")
dist [1] <- abs (Angle (pc1.obs[[1]], pc1.obs[[2]])) # observed ABSOLUTE
angle between PC1 vectors (PC1 loadings)
# Step 2. Combine the 2 data sets, and randomly
# partition total sample size n1 + n2 into 2
# new data sets, of sizes n1 and n2. Find
# the 1st eigenvectors for the 2 new
# permuted data sets, and recompute the
# angle between the 2 new 1st eigenvectors, theta1.
#
# Step 3. Repeat step (2) a large number of times
# (N) to find a sample distribution of the test
# statistic, theta1.
for (i in 2:(nperm+1)){
# start from 2 as 1 is reserved for observed angle between PC1 vectors (PC1
loadings)
data.resample <- data [sample (x = 1:n, replace = FALSE), ] # resampling
without replacement
groups.temp <- split(data.resample, factor) # divide permuted data
according to levels of 'factor'
pc1.resample <- lapply(groups.temp, function (x) {
prcomp(x[,1:n.vars], retx = TRUE, center = TRUE, scale. =
FALSE)$rotation[,"PC1"]
}) # compute theta1 (H0 distribution)
if (!identical (names (pc1.obs[[1]]), names (pc1.obs[[2]]))) stop ("PC1
loadings in two vectors are not in the same order")
dist [i] <- abs (Angle (pc1.resample[[1]], pc1.resample[[2]])) # except
angle in d[1] (theta) all other angles (theta1) are under H0
}
# Significance. The permutation results involves calculating the proportion
# (P) of the observed theta1 values that are >= theta
# PLEASE CHECK IF pv IS CORRECTLY CALCULATED
pv <- sum (dist >= dist[1]) / (nperm+1)
# AS THE ORIGINAL PAPER STATED THIS:
# Since the orientation of the PCA vector is arbitrary,
# the absolute value of the angle computed is used (I use 'abs' function for
angles in the object 'dist').
# plot histogram of H0 distribution and observed angle
if (graph){
a <- hist (dist, breaks = 50, plot = FALSE)
hist(dist, breaks = 50, axes = FALSE, col = "lightseagreen",
xlim = range (a$breaks), ylim = range (a$counts),
main = "Distribution of angles between two vectors under H0 \nthat the
2 (eigen)vectors are the same",
ylab = "Number of permutations",
xlab = expression (paste ("Angle"," (",bolditalic (theta*degree), ")",
sep = "")))
axis (side = 1, at = seq (0, max (a$breaks), by = 1), las = 1)
axis (side = 2)
arrows (x0 = dist[1], y0 = ceiling (max(a$counts/100)) * 100/4,
x1 = dist[1], y1 = 0, col = "red", lwd = 3.5)
text (x = dist[1],
y = ceiling (max(a$counts/100)) * 100/4,
pos = 3,
labels = paste ("Observed\nangle:\n", round (dist[1], 3), ";\n", "p <
", round (pv, 3)))
}
res <- list (angle = dist [1] , p = pv, distribution = dist)
res
}
## Perform two tests on the dataset:
#Test 1:
library (ade4)
data ("tortues")
turtles <- log (tortues[,1:3])
sex <- factor (tortues$sexe)
set.seed (34)
coli.males <- coli (prcomp(turtles[sex == "M", ])[[2]][,1], rep(sqrt(1/3),3),
graph = TRUE, nperm = 10000)
set.seed (34)
coli.females <- coli (prcomp(turtles[sex == "F", ])[[2]][,1], rep(sqrt(1/3),3),
graph = TRUE, nperm = 10000)
res <- rbind (unlist (coli.males), unlist (coli.females))
row.names(res) <- c ("males", "females")
res
> angle p
> males 7.843779 0.0099
> females 6.831560 0.0078
#Test 2:
set.seed (34)
test2 <- coli.perm (data = turtles, factor = sex, nperm = 10000, graph = TRUE)
unlist (test2[1:2])
> angle p
> 7.37502965 0.00129987
#Test 1 (coli function) applied to PC1 loadings of both sexes (not sex-specific
PC1 vector vs. theoretical isometric vector):
set.seed (34)
try <- coli (ev1 = prcomp(turtles[sex == "M", ])[[2]][,1],
ev2 = prcomp(turtles[sex == "F", ])[[2]][,1], graph = TRUE, nperm =
10000)
unlist (try)
> angle p
> 7.37503 0.00880