Re: [MORPHMET2] How to analyze actual procrustes distance among specimens

Mike Collyer Mon, 17 Oct 2022 20:07:47 -0700

Dear Paolo,

Regarding your comment,
> 
> Thus when you say "Second, the method Paolo is outlining is not Procrustes 
> distance from GPA" it must be noted that one can freely compute "Euclidean 
> distances between observed shapes after GPA alignment" but the sole 
> "Procrustes" Distances computed are those from real shapes and the consensus 
> while the others are merely (there is any diminutio here) Euclidean distance. 
> Thus....I'm just not sure they can be called "Procrustes Distances" as PD 
> intrinsically removes rotations via OPA. Of course, numerically speaking, 
> they are, in the majority of biological applications, ludicrously close.


The way I interpret the difference between Procrustes distance and Euclidean 
distance is based on whether the space in which they are calculated is 
Euclidean.  They are both the square root of the inner-product of a vector of 
differences between corresponding coordinates of two configurations.  The 
Euclidean distance is calculated following projection into the space tangent to 
shape space. I do not interpret Procrustes distance to strictly require OPA 
between two configurations; rather, I rely on (my interpretation of) Fred 
Bookstein’s description of Procrustes distance as the square root of the inner 
product of a vector of differences (the summed squared differences) between 
corresponding landmark positions in two *optimally superimposed* 
configurations.  I suspect that you interpret “optimal" to mean strictly the 
outcome of OPA between two configurations, perhaps because two configurations 
could not be better aligned to each other than they could be via OPA?  However, 
with three or more configurations, “optimal” might (and probably usually) means 
something different, like the best alignment of each configuration to the 
consensus configuration, even if that means for any pair of measured 
configurations, the variance between corresponding landmarks is not minimized 
(because optimization minimizes the variance with respect to all pairs of each 
configuration matched to the consensus).  

I think “optimal” is perhaps subjective.  For example, some arbitrary choices 
might have to be made with sliding semilandmarks if convergence is not possible 
with GPA, by maybe arresting an analysis after, e.g., 3, 5, 10, or 50 
iterations.  Different choice of iterations might and probably will yield 
alternative Procrustes residuals, all of which are assumed to have been 
optimally superimposed.  Nevertheless, if the inner-product of vectors of 
differences between Procrustes residuals are calculated for all pairwise 
comparisons, before projection, I would consider the square-roots of these 
values as Procrustes distances (via GPA, the method by which optimal 
superimposition was defined).  Adding an algorithm to find the pairwise OPA 
between configurations in order to calculate Procrustes distance means the 
optimal superimposition strategy was abandoned, in my opinion.  This is what I 
meant by “not Procrustes distance from GPA”.  Therefore, in my opinion, both 
distance between two configurations aligned by OPA and distance between two 
configurations aligned with GPA could be called Procrustes distance, but what 
is regarded as the optimal superimposition (rotation, specifically) is 
different.

Regarding the convergence of methods for empirical datasets (similar PCA on 
Procrustes residuals or PCoA on pairwise Procrustes distances), I guess it is 
probably the case when alignment to the mean is consistent with alignment to 
any other configuration, which is also aligned to the mean.  There might be 
cases where several configurations are similar and would have a better 
alignment with each other than with a mean that is also based in part on 
configurations that are quite different.  (I tend to think of an example with 
fishes with fins moving all over the lateral side of a fish body.)  Perhaps 
pairwise OPA would yield perceptibly different Procrustes distances than a 
global/GPA solution, but I have trouble believing that the rank-order of 
distances could be drastically altered.  I do believe we are splitting hairs 
with this pedantic excursion.

Speaking of splitting hairs, if someone should have a large data set and wish 
to calculate pairwise Procrustes distances based on OPA rather than those based 
on a global superimposition strategy (like GPA) after having results from GPA 
(like with gpagen), below is a useful time-saving tool to do this in R.  I show 
this with the example Paolo provided:

> system.time(pairwisedist <- paireucdist(plethodon$land)) # Paolo’s function

   user  system elapsed 
  0.857   0.221   1.315 

# Convergence of methods, using gpagen.
# First, assume GPA is performed

> system.time(Y.gpa <-gpagen(plethodon$land, PrinAxes = FALSE, verbose = TRUE, 
> Proj = FALSE, print.progress = FALSE))
   user  system elapsed 
  0.007   0.000   0.007 

# Note if we do GPA again, just with the first two specimens, and look at the 
Procrustes distance

> gpagen(Y.gpa$coords[,, 1:2], print.progress = FALSE, verbose = TRUE)$procD
           1
2 0.06353302

# It’s the same as the paireucdist results

> pairwisedist[1,]
                             res 
1.00000000 2.00000000 0.06353302 


# The following is a function to perform OPA for each pairwise result.
# It relies on configurations already having centroid size (no scaling required)
# (Rescaling would be a simple step to add, if desired).
# It produces a distance matrix, like the dist function in R would.

> pairwise.pd <- function(GPA){
+     dims <- dim(GPA$coords)
+     k <- dims[2]
+     p <- dims[1]
+     n <- dims[3]
+     ind <- combn(n, 2)
+     
+     res <- lapply(1:ncol(ind), function(j){
+         x <- ind[, j]
+         Y1 <- GPA$coords[,, x[1]]
+         Y2 <- GPA$coords[,, x[2]]
+         V <- crossprod(Y1, Y2)
+         sv <- La.svd(V, k, k)
+         u <- sv$u; u[,k] <- u[,k]*determinant(V)$sign
+         Yr <- tcrossprod(Y2, u %*% sv$vt)
+         sqrt(sum((Y1 - Yr)^2))
+     })
+     
+     res <- unlist(res)
+     D <- as.dist(matrix(0, n, n))
+     D[1:length(res)] <- res
+     return(D)
+ }
> 

# Proof that it does the same thing

> system.time(PD <- pairwise.pd(Y.gpa))
   user  system elapsed 
  0.044   0.003   0.054 

> cbind(pairwisedist, PD)[1:10,]
                  res         PD
 [1,] 1  2 0.06353302 0.06353302
 [2,] 1  3 0.07727369 0.07727369
 [3,] 1  4 0.07374885 0.07374885
 [4,] 1  5 0.06326163 0.06326163
 [5,] 1  6 0.06412749 0.06412749
 [6,] 1  7 0.08359785 0.08359785
 [7,] 1  8 0.06413572 0.06413572
 [8,] 1  9 0.05978269 0.05978269
 [9,] 1 10 0.06478204 0.06478204
[10,] 1 11 0.10952660 0.10952660

> all.equal(pairwisedist[,3], as.vector(PD))
[1] TRUE

The combination between gpagen and this method of calculating pairwise 
Procrustes distance took about 0.06 seconds as opposed to the 1.3 seconds 
required by the paireucdist function, which has more options, obviously.  Not a 
big deal here.  However, if someone has a large data set (unlike this simple 
one) on which GPA has already been performed, and coordinate data have unit 
centroid size, relying on the quick OPA algorithm without scaling 
configurations might take only seconds or minutes versus hours.  (The same 
function with a rescaling step increased the computation time by 0.03 seconds 
for me.  Thus, it is probably just faster because of having performed GPA, 
already.)

Warm regards!

Mike

> On Oct 17, 2022, at 5:04 PM, Paolo Piras <[email protected]> wrote:
> 
> Thanks Mike for some clarifications about gpagen().
> As for "pairwise" terminology...starting from the end of your message 
> "Whether the other n - 2 configurations do not matter might be interesting to 
> discuss.": as I said..it depends from the applications: in the case or 
> Parallel Transport, Geodesics, auto-parallel lines, Geodesics 
> splines...etc...and all application where OPA (Ordinary Procrustes Analysis) 
> between pairs of shapes matters ..one is forced to compute pairwise 
> Procrustes Distance (PD) at various steps of the above mentioned procedures. 
> PD is something that is always between two configurations: in the case of  
> GPA one is always the consensus and the others are real observed shapes.  
> Thus when you say "Second, the method Paolo is outlining is not Procrustes 
> distance from GPA" it must be noted that one can freely compute "Euclidean 
> distances between observed shapes after GPA alignment" but the sole 
> "Procrustes" Distances computed are those from real shapes and the consensus 
> while the others are merely (there is any diminutio here) Euclidean distance. 
> Thus....I'm just not sure they can be called "Procrustes Distances" as PD 
> intrinsically removes rotations via OPA. Of course, numerically speaking, 
> they are, in the majority of biological applications, ludicrously close.
> It is possible my argument suffers from some terminological purity lack but 
> the substance is there I guess. 
> "I wonder how one sees the forest from the trees?" I admit this pushed me to 
> do something I never did: I performed a Principal coordinates analysis on the 
> pairwise Procrustes Distance (via pairwise OPA); thus there is no GPA, no 
> consensus estimation. The resulting ordination space cannot be closer to the 
> pca ordination space after GPA alignment. Maybe this was shown in old 
> blue-book or white-book or Fred 1991 book but I'm not sure at all.  
> Of course we cannot restore ANY shape from Principal Coordinates vectors 
> extremes. For that operation we need the mean shape coordinates, i.e. the 
> consensus of the GPA!.
> Anyway..I think all this little discussion is useful and interesting, 
> independently from the fact that, operationally, does not change anything in 
> the common practice. 
> I always benefited from this kind of discussions as I always learned new 
> things in areas where (erroneously) I felt confident. 
> All the best
> Paolo
> 
> just append these lines to my previous code.
> 
> if (!requireNamespace("BiocManager", quietly = TRUE))
>     install.packages("BiocManager")
> BiocManager::install("hopach") 
> library(hopach)
> pairmat<-dissmatrix(pairwisedist[,3]) #### hopach was needed for using this 
> function...
> library(ape)
> ord<-pcoa(pairmat)
> ord2<-procSym(plethodon$land)
> par(mfrow=c(1,2))
> plot(ord2$PCscores[,1:2],asp=1)
> plot(ord$vectors[,1:2],asp=1)
> 
> 
> 
> 
> 
> 
> 
> 
> Il giorno lun 17 ott 2022 alle ore 19:30 Mike Collyer <[email protected] 
> <mailto:[email protected]>> ha scritto:
> For what it’s worth…
> 
> I have neither tried to fully penetrate Paolo’s code nor the technical 
> details in the paper he sent, but I did notice his code used the default of 
> gpagen to project Procrustes residuals into tangent space.  Therefore, the 
> density plot results appear to confound methodological differences and 
> whether configurations were projected into tangent space.  Updating that 
> portion of the code (not projecting Procrustes residuals) did not seem to 
> resolve anything, and I would not expect it, since the Procrustes distances 
> were measured among specimens after alignment to the mean configuration via 
> GPA, not alignment between pairs of configurations.  However, an interesting 
> observation.  The “distortion” between distances from the pairwise approach 
> Paolo provided code for and the distances from the GPA-aligned configurations 
> is on par with the distortion one would expect from projection into tangent 
> space; i.e., so small in this empirical example as to be rounding error if 
> using more than 6 decimal places.
> 
> It is an interesting debatable point.  Calculating Procrustes distances by 
> pairs means that the same result will be found between pairs, irrespective of 
> whether a full data set is expanded or subsetted (in which case the means 
> could change, and thus, the Procrustes residuals could also change).  
> Alternatively, without the mean, without projection into a tangent space, and 
> without visualization of points in the space, perhaps with a mapping of shape 
> change (transformation grids), I wonder how one sees the forest from the 
> trees?
> 
> Anyway, to confirm, yes, gpagen uses an “Euclidean fashion” to calculate 
> distances based on Procrustes residuals that have been obtained via GPA and 
> does not subset the data to all possible n(n - 1) / 2 pairs for n specimens, 
> with an extra rotation step in each case.  (I did confirm that after 
> performing GPA with gpagen, and then performing it again on the Procrustes 
> residuals obtained only for, e.g., specimens 1 and 2, specimens 1 and 3, …, I 
> get the exact same results as obtained from Paolo’s code, meaning the 
> pairwise approach is GPA performed on Procrustes residuals from GPA, by 
> pairs.)  So two important points from this.  First, in gpagen, I should have 
> been clear before that the Procrustes distances if calculated on projected 
> residuals are actually Euclidean distances in the tangent space.  Do not 
> project if you wish to have Procrustes distances.  Second, the method Paolo 
> is outlining is not Procrustes distance from GPA.  It is Procrustes distance 
> based only on two shapes, which is tantamount to performing GPA, and 
> subsequently realigning configurations as if all other n - 2 configurations 
> do not matter.  Whether the other n - 2 configurations do not matter might be 
> interesting to discuss.
> 
> Cheers!
> Mike
> 
> 
> 
>> On Oct 17, 2022, at 11:51 AM, Paolo Piras <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> It is very probable [correct me if I'm wrong, but see the last line of code 
>> below] that gpagen(), and other functions from other packages, computes 
>> pairwise distances in the Euclidean fashion after gpa alignment. While there 
>> is nothing wrong in the Euclidean fashion (in most cases) I think that these 
>> distances still contain rotations, albeit very small. In fact...aligned 
>> specimens are aligned with the consensus but not among them. At any pairwise 
>> comparison one should align each pair. In fact, the concept of "alignment" 
>> pertains to single pairs. This will have ***very little*** impact on the 
>> majority of cases and my message is just a technicality. However, if one 
>> wants to perform some kind of parallel transport in the shape space or size 
>> and shape space he/she should be aware about this. In particular, see 
>> section 3 in the attached paper. 
>> Let's try this with the help of an ad hoc written function (that allows both 
>> Euclidean/Kendall option as well as shape-space/size-and-shape-space option) 
>> that is a little bit slow but works...
>> There are also some ancillary functions...
>> All the best 
>> Paolo
>> 
>> 
>> Just select all and run
>> 
>> library(geomorph)
>> library(Morpho)
>> library(shapes)
>> library(Matrix)
>> data(plethodon) 
>> 
>> array2mat<-function(array,inds,vars){
>>   
>> if(class(array)=="matrix"){array<-array(array,dim=c(nrow(array),ncol(array),1))}
>>   X1 <-aperm(array,c(3,2,1))
>>   dim(X1)<- c(inds, vars)
>>   
>> if(!is.null(dimnames(array)[3])){rownames(X1)<-unlist(dimnames(array)[3])}else{rownames(X1)<-c(1:nrow(X1))}
>>   return(X1)
>> }
>> scaleshapes<-function(array){
>>   k<-dim(array)[1]
>>   m<-dim(array)[2]
>>   n<-dim(array)[3]
>>   library(abind)
>>   library(shapes)
>>   library(Morpho)
>>   divmat<-function(mat){
>>     mat2<-mat/cSize(mat)
>>     mat2
>>   }
>>   
>> if(is.matrix(array)==T){array<-array(array,dim=c(nrow(array),ncol(array),1))}
>>   scaled<-vecx(t(apply(array,3,divmat)),revert=T,lmdim=m)
>>   return(scaled)
>> }
>> 
>> centroids<-function(array){
>>   meanmat<-function(mat){
>>     mean<-apply(mat,2,mean)
>>     mean
>>   }
>>   
>> if(class(array)[1]=="matrix"){array<-array(array,dim=c(dim(array)[1],dim(array)[2],1))}
>>   centroidi<-t(apply(array,3,meanmat))
>>   return(centroidi)
>> }
>> 
>> centershapes<-function(array,vectors=NULL){
>>   
>> if(is.matrix(array)==T){array<-array(array,dim=c(nrow(array),ncol(array),1))}
>>   if(is.null(vectors)==T){centros<-centroids(array)}else{centros=vectors}
>>   k<-dim(array)[1]
>>   m<-dim(array)[2]
>>   n<-dim(array)[3]
>>   prov<-array(rep(t(centros),each=k),dim=c(k,m,n))
>>   array2<-array-prov
>>   return(array2)
>> }
>> eucdist<-function(a,b,scale=T,center=T,doopa=T){
>>   if(center==T){
>>     a<-centershapes(a)[,,1]
>>     b<-centershapes(b)[,,1]
>>   }
>>   
>>   if(doopa==T){
>>     theopa<-rotonto(a,b,scale=F,reflection=F)
>>     a<-a
>>     b<-theopa$yrot}
>>   
>>   k<-nrow(a)
>>   m<-ncol(a)
>>   
>>   if(scale==T){
>>     a<-scaleshapes(array(a,dim=c(k,m,1)))[,,1]
>>     b<-scaleshapes(array(b,dim=c(k,m,1)))[,,1]
>>   }
>>   
>> dist<-dist(rbind(array2mat(array(a,dim=c(k,m,1)),1,k*m),array2mat(array(b,dim=c(k,m,1)),1,k*m)))
>>   dist 
>> }
>> 
>> paireucdist<-function(array,scale=T,doopa=T,euc=T,noneuc=c("kendall","ssriemdist")){
>>   confs<-combn(dim(array)[3],2)
>>   if(!is.null(dimnames(array)[[3]])){
>>     contrlist<-NULL
>>     for(i in 1:ncol(confs)){
>>       
>> contrveci<-paste(dimnames(array)[[3]][confs[,i][1]],dimnames(array)[[3]][confs[,i][2]],sep="/")
>>       contrlist<-c(contrlist,list(contrveci))
>>     }}
>>   res<-NULL
>>   for(i in 1:ncol(confs)){
>>     
>> if(euc==T){resi<-eucdist(array[,,confs[,i][1]],array[,,confs[,i][2]],scale=scale,doopa=doopa)}else{
>>       
>> if(noneuc[1]=="kendall"){resi<-kendalldist(array[,,confs[,i][1]],array[,,confs[,i][2]])}
>>       
>> if(noneuc[1]=="ssriemdist"){resi<-ssriemdist(array[,,confs[,i][1]],array[,,confs[,i][2]])}
>>     }
>>     res<-c(res,resi)
>>     print(paste(i,"of",length(confs)/2,sep=" "))
>>   }
>>   
>> if(is.null(dimnames(array)[[3]])){cbind(t(combn(dim(array)[3],2)),res)}else{
>>     data.frame(unlist(contrlist),res,apply(t(confs),2,paste))}
>> }
>> Y.gpa <-gpagen(plethodon$land, PrinAxes = FALSE,verbose = T)
>> mat<-Y.gpa$procD
>> 
>> ##Euclidean fashion
>> pairwisedist<-paireucdist(plethodon$land)
>> plot(pairwisedist[,3],mat,asp=1) ##### of course they are very close in this 
>> case
>> plot(density(pairwisedist[,3]/mat)) #### not identical
>> ##Kendall fashion
>> pairwisedist2<-paireucdist(plethodon$land,euc=F,noneuc="kendall")
>> plot(pairwisedist2[,3],mat,asp=1) ##### of course they are very close in 
>> this case
>> plot(density(pairwisedist2[,3]/mat)) #### not identical
>> 
>> ### scholastic
>> plot(density(pairwisedist2[,3]/pairwisedist[,3])) #### not identical
>> 
>> 
>> ### 
>> mat2<-dist(array2mat(Y.gpa$coords,40,24))
>> range(mat2-mat) #### this confirms that "pairwise" distances in gpagen() are 
>> computed without re-aligning each pair. 
>> 
>> Il giorno lun 17 ott 2022 alle ore 16:49 Dagmawit Getahun 
>> <[email protected] <mailto:[email protected]>> ha scritto:
>> Hi Mike,
>> 
>> Thank you so much! This was very helpful and I was able to get the 
>> distances. 
>> 
>> Best,
>> Dag 
>> 
>> On Sun, Oct 16, 2022 at 9:40 PM Mike Collyer <[email protected] 
>> <mailto:[email protected]>> wrote:
>> Dear Dag,
>> 
>> When running the GPA with the gpagen function in geomorph, use the argument 
>> verbose = TRUE and it will include the Procrustes distances among specimens 
>> as output. This can take a while with large data sets, which is why it is 
>> not set as a default. 
>> 
>> Cheers!
>> Mike
>> 
>> Sent from my iPhone
>> 
>>> On Oct 16, 2022, at 9:33 PM, Dagmawit Getahun <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> Hi all,
>>> 
>>> I have a dataset of 3D cranial landmarks in the Morphologika file format as 
>>> a text file. I was able to upload it and run a GPA as well as PCA in R 
>>> using geomorph and I wanted to know if there was an R code to analyze the 
>>> actual Procrustes distances between the specimens in my dataset and whether 
>>> I can do that in the format that my data set is in? 
>>> 
>>> Thanks in advance for the help!
>>> 
>>> Best,
>>> Dag 
>>> 
>>> -- 
>>> You received this message because you are subscribed to the Google Groups 
>>> "Morphmet" group.
>>> To unsubscribe from this group and stop receiving emails from it, send an 
>>> email to [email protected] 
>>> <mailto:[email protected]>.
>>> To view this discussion on the web visit 
>>> https://groups.google.com/d/msgid/morphmet2/a9136b76-7691-4d80-9096-6c59afdfe8a4n%40googlegroups.com
>>>  
>>> <https://groups.google.com/d/msgid/morphmet2/a9136b76-7691-4d80-9096-6c59afdfe8a4n%40googlegroups.com?utm_medium=email&utm_source=footer>.
>> 
>> 
>> -- 
>> Dagmawit Abebe Getahun
>> Doctoral Candidate 
>> Physical Anthropology
>> The Graduate Center , CUNY
>> New York Consortium in Evolutionary Primatology
>> 
>> Adjunct Lecturer, Anthropology Department
>> Lehman College, CUNY 
>> 
>> 
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "Morphmet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to [email protected] 
>> <mailto:[email protected]>.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/morphmet2/CALLJY8%2BejP-9F0-V-%3Dq4nSFADtBcbxb%3DOFVKhdLo2Y%3DE%2B7UJzA%40mail.gmail.com
>>  
>> <https://groups.google.com/d/msgid/morphmet2/CALLJY8%2BejP-9F0-V-%3Dq4nSFADtBcbxb%3DOFVKhdLo2Y%3DE%2B7UJzA%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>> <Varano et al BIOMAT.pdf>
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/9BC64E8E-FA67-437A-879B-D66DFF469243%40gmail.com.

Re: [MORPHMET2] How to analyze actual procrustes distance among specimens

Reply via email to