Dear Morphmetricians,

I want to compare statistically if two vectors of PC1 loadings differ in 
terms of their direction which can be measured by the angle between them. 

Over the years similar questions appear over morphomet (e.g. thread1 
<https://www.mail-archive.com/[email protected]/msg02270.html>, 
thread2 
<https://www.mail-archive.com/[email protected]/msg02148.html>, 
...) but they are mostly related to geometric morphometrics approaches 
(e.g. approach used in MorphoJ, part of PTA in geomorph, or customized 
approaches provided in papers outlined in this Dean Adams' post 
<https://www.mail-archive.com/[email protected]/msg02276.html>). 
It might be important to mention that MorphoJ and geomorph treat the null 
hypothesis (H0) differently when comparing angles (e.g. see the post of 
<https://www.researchgate.net/post/Does-anyone-have-experience-with-vector-analysis-and-3D-data-in-MorphoJ>Miriam
 
L. Zelditch on RG 
<https://www.researchgate.net/post/Does-anyone-have-experience-with-vector-analysis-and-3D-data-in-MorphoJ>),
 
so one should be aware of potential differences in results even though the 
same data have been used.

In this post, I would like to compare if males and females differ in their 
(multivaraite) size which is defined as a vector of PC1 loadings obtained 
from decomposition of covariance matrix (not correlation) of log 
transformed linear distances (Klingenberg, 1996 
<https://morphometrics.uk/PDF_files/WhiteBook.pdf>). Such PC1 vector of 
logged variables corresponds to a growth-axis, recording both isometric and 
allometric variations. If groups (e.g. sexes) share the same overall size - 
PC1 loadings of sexes have similar direction statistically speaking - it is 
reasonable to treat PC2-PCn as descriptors of overall shape (e.g. they 
correspond to variation not explained by general growth) of animals and 
"analytical" size-correction (e.g. PCA shearing method) is possible, yet 
this approach has limitations (see McCoy et al., 2006 
<https://www.researchgate.net/publication/7176310_Size_correction_Comparing_morphological_traits_among_populations_and_environments>
 among 
others). 

If I understood correctly, two questions (or assumptions) are important 
here:

1) Is a vector of PC1 loadings for each sex isometric (in other words 
colinear?)

2) PC1 loadings between sexes are pointing in same direction as otherwise 
"size-correction" will be biased just because groups do not share common 
size axis.

*The first assumption* is explained and coded in the book "Morphometrics 
with R" 
<https://www.researchgate.net/publication/258885152_Morphometrics_With_R> 
written 
by Julien Claude (pages 103-104) 
<https://www.researchgate.net/publication/350439729_Rfunctions1_R_functions_for_geometric_morphometrics_developed_in_Claude_J_2008_Morphometrics_with_R?_sg%5B0%5D=R6_3W-64FZ3h43VzvFZh02awzYTBePhLB2hxLM3Tpoqeb0H2D5-xl4ZgiW5kn_xyQX7vqs_v4qrB67eqEXAOF9LseEVn3jo8NDHNJ0D0.NRfonsjUFiXnahK4ajI4NPb-t5OufVpHlO8TyNgfZo2v1f5dXnNe-hpILpjNZFCtMIgVaPq0wY7TsjSU9RAdmw>.
 
Function name is "coli" and according to text from the book it evaluates a 
possible statistic for isometry. More specifically, the function tests if 
sex-specific PC1 loadings differ from theoretical isometric vector (*i*) 
which has the same length (number of elements, *p*) as the length of PC1 
loadings but all values in *i* are identical to 1/sqrt (p). I will treat 
this as a "Test 1".

*The second assumption *can be tested with permutations explained in the 
work of Tzeng and Yeh (1999) 
<https://zoolstud.sinica.edu.tw/Journals/38.1/10.pdf> which used 
randomization in order to access statistical significance of observed angle 
between two vectors. I will treat this test as as a "Test 2".

For the demonstration purpose and reproducibility I will use dataset of 
three linear measurements on 48 turtles (24 males and 24 females) (Jolicoeur 
and Mosimann, 1960 <http://pbil.univ-lyon1.fr/ADE-4/ade4-html/tortues.html>). 
Codes for functions (coli and coli.perm) as well as downstream analyses 
with reproducible results are in the "functions and example.txt" file 
attached in this email. 

-------------------------------------------------------------

run the code form the  "functions and example.txt" file attached

or see results provided in the file.

-------------------------------------------------------------

*Questions:*

1. Is the H0 for Test 1 as follows: "Two vectors are not colinear (not 
similar) no more than expected by chance (randomness)?" Likewise, if we 
reject H0, we can say that two vectors are colinear and PC1 vector follows 
theoretical isometry. Is this correct? What is puzzling me is the part 
related to calculation of p value: it is "H0 < observed" but not "H0 > 
observed"... 

2. In accordance to Test 2, H0 is defined as: "The null hypothesis that we 
want to test is that the 2 first eigenvecters are the same (pointing in the 
same direction?)." If so, given that we rejected H0 in "test2", we can 
conclude that males and females do not share the common size axis. 
Therefore, if one would like to use PCA shearing method it would be 
problematic here and other methods might be used instead each having it's 
own limitation (CPCA, Burnaby-Back-Projection and so on...).

3. If we compare "test2" and "try" objects (PC1 males vs. PC1 females) we 
came up with similar p values (test2: p = 0.0013 and try: p = 0.009) but 
their conclusions are not the same. For instance, if we apply verbal 
argumentation Test 2 says "PC1 vectors of sexes are not pointing in the 
same direction" and test "try" says "PC1 vectors of sexes are colinear". 
This sounds counter-intuitive... or not. I am kind of lost what are H0 for 
those tests. Does this test "try" have any relations to the "test2" at all 
in terms of conclusions and are they even comparable?

-------------------------------------------------------------

I appreciate any input.

Kind regards,

Marko

-- 
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/9c418b9f-6a10-4f08-adb9-91453481e724n%40googlegroups.com.
## Functions
# Test 1
coli <- function (ev1, ev2, nperm = 1000, graph = TRUE) {
          # coli function written by Julien Claude (2008)
          # this one is updated version from Julien Claude RG
          # 
https://www.researchgate.net/publication/350439729_Rfunctions1_R_functions_for_geometric_morphometrics_developed_in_Claude_J_2008_Morphometrics_with_R
          
          dist <- numeric (nperm)
          n <- length (ev1)

          Angle <- function (v1, v2) {
          # angle is in degrees not radians
          90 - abs ((180 * (acos (sum (v1 * v2) / (sqrt (sum (v1^2)) * sqrt 
(sum (v2^2))))) / pi) 
         - 90)
          }

          for (i in 1:nperm) {# random vectors of size n generated from uniform 
distribution
          X1 <- runif (n, -1, 1)
          X2 <- runif (n, -1, 1)
          dist[i] <- Angle (X1, X2)
          }
          
          zobs <- Angle(ev1, ev2) # observed angle

          # significance
          pv <- length (dist[dist < zobs]) / nperm # IN THE BOOK originally it 
was 
          # "length(dist[dist>zobs])/nperm" but current version is after errata 
1.81
          
          if (graph) {
          hist (dist, breaks = 50,
                main = "Distribution of angles between 2 random vectors", xlab 
= "Z statistic", 
          ylab = "# of random vectors", sub = paste ("Actual z-obs =",round 
(zobs, 5),": 
          p<", round (pv), 5))
            abline (v = zobs)
          }
          
          list (angle = zobs, p = pv)
}

# Test 2
coli.perm <- function (data, factor, nperm = 10000, graph = TRUE) {

  # coli.perm function written by Marko Djurakic
  # function based on permutations according to Tzeng and Yeh (1999)
  # 'data' argument should be numeric matrix where columns are variables and 
rows are specimens
  # in the case of linear measurements, variables should be log transformed 
before the function call
  # 
  # 'factor' argument is a factor of the length same as the number of rows in 
the 'data'
  # 
  # 'nperm' is the number of permutations (resampling without replacement)
  
  # PCA is based on log transformed variables and their covariance matrix (not 
correlation) 
  
  dist <- numeric (nperm + 1) # empty vector of length equal to number of 
permutations + 1
  n <- length (factor) # number of specimens
  n.vars <- ncol (data) # number of raw variables (columns of the 'data')
  
  if (nrow (data) != n) stop ("number of rows in 'data' should be the same as 
the length of 'factor'")
  
  groups <- split(data, factor) # split the 'data' according to levels of 
'factor'
  
  # function that calculates angle between 2 vectors
  Angle <- function (v1, v2) {
    # written by Julien Claude (2008)
    # sum (v1 * v2) = dot (inner) product of vectors v1 and v2; same as v1 %*% 
v2
    # sqrt(sum(V1^2)) = norm of vector v1
    # sqrt(sum(V2^2)) = norm of vector v2
    # acos(x) = arc-cosine of x
    # angle is in degrees!
    90 - abs ((180 * (acos (sum (v1 * v2) / (sqrt (sum (v1^2)) * sqrt (sum 
(v2^2))))) / pi) - 90)
  }
  
  # Step 1. Find the 1st eigenvectors for the 2 original
  # data sets, and compute the angle between
  # the 2 first eigenvectors, theta.
  pc1.obs <- lapply (groups, function (x) {
    prcomp(x[,1:n.vars], retx = TRUE, center = TRUE, scale. = 
FALSE)$rotation[,"PC1"]
  })
  
  if (!identical (names (pc1.obs[[1]]), names (pc1.obs[[2]]))) stop ("PC1 
loadings in two vectors are not in the same order")
  
  dist [1] <-  abs (Angle (pc1.obs[[1]], pc1.obs[[2]])) # observed ABSOLUTE 
angle between PC1 vectors (PC1 loadings)
  
  # Step 2. Combine the 2 data sets, and randomly
  # partition total sample size n1 + n2 into 2
  # new data sets, of sizes n1 and n2. Find
  # the 1st eigenvectors for the 2 new
  # permuted data sets, and recompute the
  # angle between the 2 new 1st eigenvectors, theta1.
  # 
  # Step 3. Repeat step (2) a large number of times
  # (N) to find a sample distribution of the test
  # statistic, theta1.
  for (i in 2:(nperm+1)){
    # start from 2 as 1 is reserved for observed angle between PC1 vectors (PC1 
loadings)
    data.resample <- data [sample (x = 1:n, replace = FALSE), ] # resampling 
without replacement
    groups.temp <- split(data.resample, factor) # divide permuted data 
according to levels of 'factor'
    
    pc1.resample <- lapply(groups.temp, function (x) {
      prcomp(x[,1:n.vars], retx = TRUE, center = TRUE, scale. = 
FALSE)$rotation[,"PC1"]
    }) # compute theta1 (H0 distribution)
    
    if (!identical (names (pc1.obs[[1]]), names (pc1.obs[[2]]))) stop ("PC1 
loadings in two vectors are not in the same order")
    
    dist [i] <-  abs (Angle (pc1.resample[[1]], pc1.resample[[2]])) # except 
angle in d[1] (theta) all other angles (theta1) are under H0
  }
  
  # Significance. The permutation results involves calculating the proportion
  # (P) of the observed theta1 values that are >= theta
  
  # PLEASE CHECK IF pv IS CORRECTLY CALCULATED
  pv <- sum (dist >= dist[1]) /  (nperm+1) 
  # AS THE ORIGINAL PAPER STATED THIS:
  # Since the orientation of the PCA vector is arbitrary,
  # the absolute value of the angle computed is used (I use 'abs' function for 
angles in the object 'dist').
  
  # plot histogram of H0 distribution and observed angle
  if (graph){
    a <- hist (dist, breaks = 50, plot = FALSE)
    hist(dist, breaks = 50, axes = FALSE, col = "lightseagreen",
         xlim = range (a$breaks), ylim = range (a$counts),
         main = "Distribution of angles between two vectors under H0 \nthat the 
2 (eigen)vectors are the same",
         ylab = "Number of permutations",        
         xlab = expression (paste ("Angle"," (",bolditalic (theta*degree), ")", 
sep = "")))
    axis (side = 1, at = seq (0, max (a$breaks), by = 1), las = 1)
    axis (side = 2)
    
    arrows (x0 = dist[1], y0 = ceiling (max(a$counts/100)) * 100/4,
            x1 = dist[1], y1 = 0, col = "red", lwd = 3.5)
    
    text (x = dist[1],
          y = ceiling (max(a$counts/100)) * 100/4,
          pos = 3,
          labels = paste ("Observed\nangle:\n", round (dist[1], 3), ";\n", "p < 
", round (pv, 3)))
    
  }
  
  res <- list (angle = dist [1] , p = pv, distribution = dist)
  res
}

## Perform two tests on the dataset:
#Test 1:

library (ade4)

data ("tortues")
turtles <- log (tortues[,1:3])
sex <- factor (tortues$sexe)

set.seed (34)
coli.males <- coli (prcomp(turtles[sex == "M", ])[[2]][,1], rep(sqrt(1/3),3), 
graph = TRUE, nperm = 10000)

set.seed (34)
coli.females <- coli (prcomp(turtles[sex == "F", ])[[2]][,1], rep(sqrt(1/3),3), 
graph = TRUE, nperm = 10000)
res <- rbind (unlist (coli.males), unlist (coli.females))
row.names(res) <- c ("males", "females")
res


>          angle     p
>  males   7.843779  0.0099
>  females 6.831560  0.0078


#Test 2:

set.seed (34)
test2 <- coli.perm (data = turtles, factor = sex, nperm = 10000, graph = TRUE)
unlist (test2[1:2])

> angle       p 
> 7.37502965  0.00129987

#Test 1 (coli function) applied to PC1 loadings of both sexes (not sex-specific 
PC1 vector vs. theoretical isometric vector):

set.seed (34)
try <- coli (ev1 = prcomp(turtles[sex == "M", ])[[2]][,1],
          ev2 = prcomp(turtles[sex == "F", ])[[2]][,1], graph = TRUE, nperm = 
10000)
unlist (try)

> angle    p 
> 7.37503  0.00880 

Reply via email to