Re: [MORPHMET2] How to analyze actual procrustes distance among specimens

Paolo Piras Tue, 18 Oct 2022 01:54:40 -0700

Thanks Mike for speeding up my solution. Maybe in your own function can be
implemented Riemannian Kendall and Le metrics (slower than Eucliudean,
however); as you said...the function assumes a preliminary GPA  which is
not desirable in some particular cases (not used in common practice) where
MOPA (Modified Ordinary Procrustes Analysis) and OPA must be alternated. I
will try do to this.
Best
Paolo



Il giorno mar 18 ott 2022 alle ore 05:07 Mike Collyer <[email protected]>
ha scritto:

> Dear Paolo,
>
> Regarding your comment,
>
>
> Thus when you say "Second, the method Paolo is outlining is not Procrustes
> distance from GPA" it must be noted that one can freely compute "Euclidean
> distances between observed shapes after GPA alignment" but the sole
> "Procrustes" Distances computed are those from real shapes and the
> consensus while the others are merely (there is any diminutio here)
> Euclidean distance. Thus....I'm just not sure they can be called
> "Procrustes Distances" as PD intrinsically removes rotations via OPA. Of
> course, numerically speaking, they are, in the majority of biological
> applications, ludicrously close.
>
>
> The way I interpret the difference between Procrustes distance and
> Euclidean distance is based on whether the space in which they are
> calculated is Euclidean.  They are both the square root of the
> inner-product of a vector of differences between corresponding coordinates
> of two configurations.  The Euclidean distance is calculated following
> projection into the space tangent to shape space. I do not interpret
> Procrustes distance to strictly require OPA between two configurations;
> rather, I rely on (my interpretation of) Fred Bookstein’s description of
> Procrustes distance as the square root of the inner product of a vector of
> differences (the summed squared differences) between corresponding landmark
> positions in two *optimally superimposed* configurations.  I suspect that
> you interpret “optimal" to mean strictly the outcome of OPA between two
> configurations, perhaps because two configurations could not be better
> aligned to each other than they could be via OPA?  However, with three or
> more configurations, “optimal” might (and probably usually) means something
> different, like the best alignment of each configuration to the consensus
> configuration, even if that means for any pair of measured configurations,
> the variance between corresponding landmarks is not minimized (because
> optimization minimizes the variance with respect to all pairs of each
> configuration matched to the consensus).
>
> I think “optimal” is perhaps subjective.  For example, some arbitrary
> choices might have to be made with sliding semilandmarks if convergence is
> not possible with GPA, by maybe arresting an analysis after, e.g., 3, 5,
> 10, or 50 iterations.  Different choice of iterations might and probably
> will yield alternative Procrustes residuals, all of which are assumed to
> have been optimally superimposed.  Nevertheless, if the inner-product of
> vectors of differences between Procrustes residuals are calculated for all
> pairwise comparisons, before projection, I would consider the square-roots
> of these values as Procrustes distances (via GPA, the method by which
> optimal superimposition was defined).  Adding an algorithm to find the
> pairwise OPA between configurations in order to calculate Procrustes
> distance means the optimal superimposition strategy was abandoned, in my
> opinion.  This is what I meant by “not Procrustes distance from GPA”.
> Therefore, in my opinion, both distance between two configurations aligned
> by OPA and distance between two configurations aligned with GPA could be
> called Procrustes distance, but what is regarded as the optimal
> superimposition (rotation, specifically) is different.
>
> Regarding the convergence of methods for empirical datasets (similar PCA
> on Procrustes residuals or PCoA on pairwise Procrustes distances), I guess
> it is probably the case when alignment to the mean is consistent with
> alignment to any other configuration, which is also aligned to the mean.
> There might be cases where several configurations are similar and would
> have a better alignment with each other than with a mean that is also based
> in part on configurations that are quite different.  (I tend to think of an
> example with fishes with fins moving all over the lateral side of a fish
> body.)  Perhaps pairwise OPA would yield perceptibly different Procrustes
> distances than a global/GPA solution, but I have trouble believing that the
> rank-order of distances could be drastically altered.  I do believe we are
> splitting hairs with this pedantic excursion.
>
> Speaking of splitting hairs, if someone should have a large data set and
> wish to calculate pairwise Procrustes distances based on OPA rather than
> those based on a global superimposition strategy (like GPA) after having
> results from GPA (like with gpagen), below is a useful time-saving tool to
> do this in R.  I show this with the example Paolo provided:
>
> > system.time(pairwisedist <- paireucdist(plethodon$land)) # Paolo’s
> function
>
>    user  system elapsed
>   0.857   0.221   1.315
>
> # Convergence of methods, using gpagen.
> # First, assume GPA is performed
>
> > system.time(Y.gpa <-gpagen(plethodon$land, PrinAxes = FALSE, verbose =
> TRUE, Proj = FALSE, print.progress = FALSE))
>    user  system elapsed
>   0.007   0.000   0.007
>
> # Note if we do GPA again, just with the first two specimens, and look at
> the Procrustes distance
>
> > gpagen(Y.gpa$coords[,, 1:2], print.progress = FALSE, verbose =
> TRUE)$procD
>            1
> 2 0.06353302
>
> # It’s the same as the paireucdist results
>
> > pairwisedist[1,]
>                              res
> 1.00000000 2.00000000 0.06353302
>
>
> # The following is a function to perform OPA for each pairwise result.
> # It relies on configurations already having centroid size (no scaling
> required)
> # (Rescaling would be a simple step to add, if desired).
> # It produces a distance matrix, like the dist function in R would.
>
> > pairwise.pd <- function(GPA){
> +     dims <- dim(GPA$coords)
> +     k <- dims[2]
> +     p <- dims[1]
> +     n <- dims[3]
> +     ind <- combn(n, 2)
> +
> +     res <- lapply(1:ncol(ind), function(j){
> +         x <- ind[, j]
> +         Y1 <- GPA$coords[,, x[1]]
> +         Y2 <- GPA$coords[,, x[2]]
> +         V <- crossprod(Y1, Y2)
> +         sv <- La.svd(V, k, k)
> +         u <- sv$u; u[,k] <- u[,k]*determinant(V)$sign
> +         Yr <- tcrossprod(Y2, u %*% sv$vt)
> +         sqrt(sum((Y1 - Yr)^2))
> +     })
> +
> +     res <- unlist(res)
> +     D <- as.dist(matrix(0, n, n))
> +     D[1:length(res)] <- res
> +     return(D)
> + }
> >
>
> # Proof that it does the same thing
>
> > system.time(PD <- pairwise.pd(Y.gpa))
>    user  system elapsed
>   0.044   0.003   0.054
>
> > cbind(pairwisedist, PD)[1:10,]
>                   res         PD
>  [1,] 1  2 0.06353302 0.06353302
>  [2,] 1  3 0.07727369 0.07727369
>  [3,] 1  4 0.07374885 0.07374885
>  [4,] 1  5 0.06326163 0.06326163
>  [5,] 1  6 0.06412749 0.06412749
>  [6,] 1  7 0.08359785 0.08359785
>  [7,] 1  8 0.06413572 0.06413572
>  [8,] 1  9 0.05978269 0.05978269
>  [9,] 1 10 0.06478204 0.06478204
> [10,] 1 11 0.10952660 0.10952660
>
> > all.equal(pairwisedist[,3], as.vector(PD))
> [1] TRUE
>
> The combination between gpagen and this method of calculating pairwise
> Procrustes distance took about 0.06 seconds as opposed to the 1.3 seconds
> required by the paireucdist function, which has more options, obviously.
> Not a big deal here.  However, if someone has a large data set (unlike this
> simple one) on which GPA has already been performed, and coordinate data
> have unit centroid size, relying on the quick OPA algorithm without scaling
> configurations might take only seconds or minutes versus hours.  (The same
> function with a rescaling step increased the computation time by 0.03
> seconds for me.  Thus, it is probably just faster because of having
> performed GPA, already.)
>
> Warm regards!
>
> Mike
>
> On Oct 17, 2022, at 5:04 PM, Paolo Piras <[email protected]> wrote:
>
> Thanks Mike for some clarifications about gpagen().
> As for "pairwise" terminology...starting from the end of your message
> "Whether the other n - 2 configurations do not matter might be interesting
> to discuss.": as I said..it depends from the applications: in the case or
> Parallel Transport, Geodesics, auto-parallel lines, Geodesics
> splines...etc...and all application where OPA (Ordinary Procrustes
> Analysis) between pairs of shapes matters ..one is forced to compute
> pairwise Procrustes Distance (PD) at various steps of the above mentioned
> procedures. PD is something that is always between two configurations: in
> the case of  GPA one is always the consensus and the others are real
> observed shapes.  Thus when you say "Second, the method Paolo is outlining
> is not Procrustes distance from GPA" it must be noted that one can freely
> compute "Euclidean distances between observed shapes after GPA alignment"
> but the sole "Procrustes" Distances computed are those from real shapes and
> the consensus while the others are merely (there is any diminutio here)
> Euclidean distance. Thus....I'm just not sure they can be called
> "Procrustes Distances" as PD intrinsically removes rotations via OPA. Of
> course, numerically speaking, they are, in the majority of biological
> applications, ludicrously close.
> It is possible my argument suffers from some terminological purity
> lack but the substance is there I guess.
> "I wonder how one sees the forest from the trees?" I admit this pushed me
> to do something I never did: I performed a Principal coordinates analysis
> on the pairwise Procrustes Distance (via pairwise OPA); thus there is no
> GPA, no consensus estimation. The resulting ordination space cannot be
> closer to the pca ordination space after GPA alignment. Maybe this was
> shown in old blue-book or white-book or Fred 1991 book but I'm not sure at
> all.
> Of course we cannot restore ANY shape from Principal Coordinates vectors
> extremes. For that operation we need the mean shape coordinates, i.e. the
> consensus of the GPA!.
> Anyway..I think all this little discussion is useful and interesting,
> independently from the fact that, operationally, does not change anything
> in the common practice.
> I always benefited from this kind of discussions as I always learned new
> things in areas where (erroneously) I felt confident.
> All the best
> Paolo
>
> just append these lines to my previous code.
>
> if (!requireNamespace("BiocManager", quietly = TRUE))
>     install.packages("BiocManager")
> BiocManager::install("hopach")
> library(hopach)
> pairmat<-dissmatrix(pairwisedist[,3]) #### hopach was needed for using
> this function...
> library(ape)
> ord<-pcoa(pairmat)
> ord2<-procSym(plethodon$land)
> par(mfrow=c(1,2))
> plot(ord2$PCscores[,1:2],asp=1)
> plot(ord$vectors[,1:2],asp=1)
>
>
>
>
>
>
>
>
> Il giorno lun 17 ott 2022 alle ore 19:30 Mike Collyer <[email protected]>
> ha scritto:
>
>> For what it’s worth…
>>
>> I have neither tried to fully penetrate Paolo’s code nor the technical
>> details in the paper he sent, but I did notice his code used the default of
>> gpagen to project Procrustes residuals into tangent space.  Therefore, the
>> density plot results appear to confound methodological differences and
>> whether configurations were projected into tangent space.  Updating that
>> portion of the code (not projecting Procrustes residuals) did not seem to
>> resolve anything, and I would not expect it, since the Procrustes distances
>> were measured among specimens after alignment to the mean configuration via
>> GPA, not alignment between pairs of configurations.  However, an
>> interesting observation.  The “distortion” between distances from the
>> pairwise approach Paolo provided code for and the distances from the
>> GPA-aligned configurations is on par with the distortion one would expect
>> from projection into tangent space; i.e., so small in this empirical
>> example as to be rounding error if using more than 6 decimal places.
>>
>> It is an interesting debatable point.  Calculating Procrustes distances
>> by pairs means that the same result will be found between pairs,
>> irrespective of whether a full data set is expanded or subsetted (in which
>> case the means could change, and thus, the Procrustes residuals could also
>> change).  Alternatively, without the mean, without projection into a
>> tangent space, and without visualization of points in the space, perhaps
>> with a mapping of shape change (transformation grids), I wonder how one
>> sees the forest from the trees?
>>
>> Anyway, to confirm, yes, gpagen uses an “Euclidean fashion” to calculate
>> distances based on Procrustes residuals that have been obtained via GPA and
>> does not subset the data to all possible n(n - 1) / 2 pairs for n
>> specimens, with an extra rotation step in each case.  (I did confirm that
>> after performing GPA with gpagen, and then performing it again on the
>> Procrustes residuals obtained only for, e.g., specimens 1 and 2, specimens
>> 1 and 3, …, I get the exact same results as obtained from Paolo’s code,
>> meaning the pairwise approach is GPA performed on Procrustes residuals from
>> GPA, by pairs.)  So two important points from this.  First, in gpagen, I
>> should have been clear before that the Procrustes distances if calculated
>> on projected residuals are actually Euclidean distances in the tangent
>> space.  Do not project if you wish to have Procrustes distances.  Second,
>> the method Paolo is outlining is not Procrustes distance from GPA.  It is
>> Procrustes distance based only on two shapes, which is tantamount to
>> performing GPA, and subsequently realigning configurations as if all other
>> n - 2 configurations do not matter.  Whether the other n - 2 configurations
>> do not matter might be interesting to discuss.
>>
>> Cheers!
>> Mike
>>
>>
>>
>> On Oct 17, 2022, at 11:51 AM, Paolo Piras <[email protected]> wrote:
>>
>> It is very probable [correct me if I'm wrong, but see the last line of
>> code below] that gpagen(), and other functions from other packages,
>> computes pairwise distances in the Euclidean fashion after gpa alignment.
>> While there is nothing wrong in the Euclidean fashion (in most cases) I
>> think that these distances still contain rotations, albeit very small. In
>> fact...aligned specimens are aligned with the consensus but not among them.
>> At any pairwise comparison one should align each pair. In fact, the concept
>> of "alignment" pertains to single pairs. This will have ***very little***
>> impact on the majority of cases and my message is just a technicality.
>> However, if one wants to perform some kind of parallel transport in the
>> shape space or size and shape space he/she should be aware about this. In
>> particular, see section 3 in the attached paper.
>> Let's try this with the help of an ad hoc written function (that allows
>> both Euclidean/Kendall option as well as shape-space/size-and-shape-space
>> option) that is a little bit slow but works...
>> There are also some ancillary functions...
>> All the best
>> Paolo
>>
>>
>> Just select all and run
>>
>> library(geomorph)
>> library(Morpho)
>> library(shapes)
>> library(Matrix)
>> data(plethodon)
>>
>> array2mat<-function(array,inds,vars){
>>
>> if(class(array)=="matrix"){array<-array(array,dim=c(nrow(array),ncol(array),1))}
>>   X1 <-aperm(array,c(3,2,1))
>>   dim(X1)<- c(inds, vars)
>>
>> if(!is.null(dimnames(array)[3])){rownames(X1)<-unlist(dimnames(array)[3])}else{rownames(X1)<-c(1:nrow(X1))}
>>   return(X1)
>> }
>> scaleshapes<-function(array){
>>   k<-dim(array)[1]
>>   m<-dim(array)[2]
>>   n<-dim(array)[3]
>>   library(abind)
>>   library(shapes)
>>   library(Morpho)
>>   divmat<-function(mat){
>>     mat2<-mat/cSize(mat)
>>     mat2
>>   }
>>
>> if(is.matrix(array)==T){array<-array(array,dim=c(nrow(array),ncol(array),1))}
>>   scaled<-vecx(t(apply(array,3,divmat)),revert=T,lmdim=m)
>>   return(scaled)
>> }
>>
>> centroids<-function(array){
>>   meanmat<-function(mat){
>>     mean<-apply(mat,2,mean)
>>     mean
>>   }
>>
>> if(class(array)[1]=="matrix"){array<-array(array,dim=c(dim(array)[1],dim(array)[2],1))}
>>   centroidi<-t(apply(array,3,meanmat))
>>   return(centroidi)
>> }
>>
>> centershapes<-function(array,vectors=NULL){
>>
>> if(is.matrix(array)==T){array<-array(array,dim=c(nrow(array),ncol(array),1))}
>>   if(is.null(vectors)==T){centros<-centroids(array)}else{centros=vectors}
>>   k<-dim(array)[1]
>>   m<-dim(array)[2]
>>   n<-dim(array)[3]
>>   prov<-array(rep(t(centros),each=k),dim=c(k,m,n))
>>   array2<-array-prov
>>   return(array2)
>> }
>> eucdist<-function(a,b,scale=T,center=T,doopa=T){
>>   if(center==T){
>>     a<-centershapes(a)[,,1]
>>     b<-centershapes(b)[,,1]
>>   }
>>
>>   if(doopa==T){
>>     theopa<-rotonto(a,b,scale=F,reflection=F)
>>     a<-a
>>     b<-theopa$yrot}
>>
>>   k<-nrow(a)
>>   m<-ncol(a)
>>
>>   if(scale==T){
>>     a<-scaleshapes(array(a,dim=c(k,m,1)))[,,1]
>>     b<-scaleshapes(array(b,dim=c(k,m,1)))[,,1]
>>   }
>>
>> dist<-dist(rbind(array2mat(array(a,dim=c(k,m,1)),1,k*m),array2mat(array(b,dim=c(k,m,1)),1,k*m)))
>>   dist
>> }
>>
>>
>> paireucdist<-function(array,scale=T,doopa=T,euc=T,noneuc=c("kendall","ssriemdist")){
>>   confs<-combn(dim(array)[3],2)
>>   if(!is.null(dimnames(array)[[3]])){
>>     contrlist<-NULL
>>     for(i in 1:ncol(confs)){
>>
>> contrveci<-paste(dimnames(array)[[3]][confs[,i][1]],dimnames(array)[[3]][confs[,i][2]],sep="/")
>>       contrlist<-c(contrlist,list(contrveci))
>>     }}
>>   res<-NULL
>>   for(i in 1:ncol(confs)){
>>
>> if(euc==T){resi<-eucdist(array[,,confs[,i][1]],array[,,confs[,i][2]],scale=scale,doopa=doopa)}else{
>>
>> if(noneuc[1]=="kendall"){resi<-kendalldist(array[,,confs[,i][1]],array[,,confs[,i][2]])}
>>
>> if(noneuc[1]=="ssriemdist"){resi<-ssriemdist(array[,,confs[,i][1]],array[,,confs[,i][2]])}
>>     }
>>     res<-c(res,resi)
>>     print(paste(i,"of",length(confs)/2,sep=" "))
>>   }
>>
>> if(is.null(dimnames(array)[[3]])){cbind(t(combn(dim(array)[3],2)),res)}else{
>>     data.frame(unlist(contrlist),res,apply(t(confs),2,paste))}
>> }
>> Y.gpa <-gpagen(plethodon$land, PrinAxes = FALSE,verbose = T)
>> mat<-Y.gpa$procD
>>
>> ##Euclidean fashion
>> pairwisedist<-paireucdist(plethodon$land)
>> plot(pairwisedist[,3],mat,asp=1) ##### of course they are very close in
>> this case
>> plot(density(pairwisedist[,3]/mat)) #### not identical
>> ##Kendall fashion
>> pairwisedist2<-paireucdist(plethodon$land,euc=F,noneuc="kendall")
>> plot(pairwisedist2[,3],mat,asp=1) ##### of course they are very close in
>> this case
>> plot(density(pairwisedist2[,3]/mat)) #### not identical
>>
>> ### scholastic
>> plot(density(pairwisedist2[,3]/pairwisedist[,3])) #### not identical
>>
>>
>> ###
>> mat2<-dist(array2mat(Y.gpa$coords,40,24))
>> range(mat2-mat) #### this confirms that "pairwise" distances in gpagen()
>> are computed without re-aligning each pair.
>>
>> Il giorno lun 17 ott 2022 alle ore 16:49 Dagmawit Getahun <
>> [email protected]> ha scritto:
>>
>>> Hi Mike,
>>>
>>> Thank you so much! This was very helpful and I was able to get the
>>> distances.
>>>
>>> Best,
>>> Dag
>>>
>>> On Sun, Oct 16, 2022 at 9:40 PM Mike Collyer <[email protected]>
>>> wrote:
>>>
>>>> Dear Dag,
>>>>
>>>> When running the GPA with the gpagen function in geomorph, use the
>>>> argument verbose = TRUE and it will include the Procrustes distances among
>>>> specimens as output. This can take a while with large data sets, which is
>>>> why it is not set as a default.
>>>>
>>>> Cheers!
>>>> Mike
>>>>
>>>> Sent from my iPhone
>>>>
>>>> On Oct 16, 2022, at 9:33 PM, Dagmawit Getahun <[email protected]>
>>>> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I have a dataset of 3D cranial landmarks in the Morphologika file
>>>> format as a text file. I was able to upload it and run a GPA as well as PCA
>>>> in R using geomorph and I wanted to know if there was an R code to analyze
>>>> the actual Procrustes distances between the specimens in my dataset and
>>>> whether I can do that in the format that my data set is in?
>>>>
>>>> Thanks in advance for the help!
>>>>
>>>> Best,
>>>> Dag
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Morphmet" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to [email protected].
>>>> To view this discussion on the web visit
>>>> https://groups.google.com/d/msgid/morphmet2/a9136b76-7691-4d80-9096-6c59afdfe8a4n%40googlegroups.com
>>>> <https://groups.google.com/d/msgid/morphmet2/a9136b76-7691-4d80-9096-6c59afdfe8a4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>>>
>>>
>>> --
>>> Dagmawit Abebe Getahun
>>> Doctoral Candidate
>>> Physical Anthropology
>>> The Graduate Center , CUNY
>>> New York Consortium in Evolutionary Primatology
>>>
>>> Adjunct Lecturer, Anthropology Department
>>> Lehman College, CUNY
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Morphmet" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to [email protected].
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/morphmet2/CALLJY8%2BejP-9F0-V-%3Dq4nSFADtBcbxb%3DOFVKhdLo2Y%3DE%2B7UJzA%40mail.gmail.com
>>> <https://groups.google.com/d/msgid/morphmet2/CALLJY8%2BejP-9F0-V-%3Dq4nSFADtBcbxb%3DOFVKhdLo2Y%3DE%2B7UJzA%40mail.gmail.com?utm_medium=email&utm_source=footer>
>>> .
>>>
>> <Varano et al BIOMAT.pdf>
>>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Morphmet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/morphmet2/CAAdPgfPpBwaHVmZGaNrx0kLLtxWNpiS8SDTxGn2nfKT7un22zw%40mail.gmail.com.

Re: [MORPHMET2] How to analyze actual procrustes distance among specimens

Reply via email to