Thanks Mike for speeding up my solution. Maybe in your own function can be implemented Riemannian Kendall and Le metrics (slower than Eucliudean, however); as you said...the function assumes a preliminary GPA which is not desirable in some particular cases (not used in common practice) where MOPA (Modified Ordinary Procrustes Analysis) and OPA must be alternated. I will try do to this. Best Paolo
Il giorno mar 18 ott 2022 alle ore 05:07 Mike Collyer <[email protected]> ha scritto: > Dear Paolo, > > Regarding your comment, > > > Thus when you say "Second, the method Paolo is outlining is not Procrustes > distance from GPA" it must be noted that one can freely compute "Euclidean > distances between observed shapes after GPA alignment" but the sole > "Procrustes" Distances computed are those from real shapes and the > consensus while the others are merely (there is any diminutio here) > Euclidean distance. Thus....I'm just not sure they can be called > "Procrustes Distances" as PD intrinsically removes rotations via OPA. Of > course, numerically speaking, they are, in the majority of biological > applications, ludicrously close. > > > The way I interpret the difference between Procrustes distance and > Euclidean distance is based on whether the space in which they are > calculated is Euclidean. They are both the square root of the > inner-product of a vector of differences between corresponding coordinates > of two configurations. The Euclidean distance is calculated following > projection into the space tangent to shape space. I do not interpret > Procrustes distance to strictly require OPA between two configurations; > rather, I rely on (my interpretation of) Fred Bookstein’s description of > Procrustes distance as the square root of the inner product of a vector of > differences (the summed squared differences) between corresponding landmark > positions in two *optimally superimposed* configurations. I suspect that > you interpret “optimal" to mean strictly the outcome of OPA between two > configurations, perhaps because two configurations could not be better > aligned to each other than they could be via OPA? However, with three or > more configurations, “optimal” might (and probably usually) means something > different, like the best alignment of each configuration to the consensus > configuration, even if that means for any pair of measured configurations, > the variance between corresponding landmarks is not minimized (because > optimization minimizes the variance with respect to all pairs of each > configuration matched to the consensus). > > I think “optimal” is perhaps subjective. For example, some arbitrary > choices might have to be made with sliding semilandmarks if convergence is > not possible with GPA, by maybe arresting an analysis after, e.g., 3, 5, > 10, or 50 iterations. Different choice of iterations might and probably > will yield alternative Procrustes residuals, all of which are assumed to > have been optimally superimposed. Nevertheless, if the inner-product of > vectors of differences between Procrustes residuals are calculated for all > pairwise comparisons, before projection, I would consider the square-roots > of these values as Procrustes distances (via GPA, the method by which > optimal superimposition was defined). Adding an algorithm to find the > pairwise OPA between configurations in order to calculate Procrustes > distance means the optimal superimposition strategy was abandoned, in my > opinion. This is what I meant by “not Procrustes distance from GPA”. > Therefore, in my opinion, both distance between two configurations aligned > by OPA and distance between two configurations aligned with GPA could be > called Procrustes distance, but what is regarded as the optimal > superimposition (rotation, specifically) is different. > > Regarding the convergence of methods for empirical datasets (similar PCA > on Procrustes residuals or PCoA on pairwise Procrustes distances), I guess > it is probably the case when alignment to the mean is consistent with > alignment to any other configuration, which is also aligned to the mean. > There might be cases where several configurations are similar and would > have a better alignment with each other than with a mean that is also based > in part on configurations that are quite different. (I tend to think of an > example with fishes with fins moving all over the lateral side of a fish > body.) Perhaps pairwise OPA would yield perceptibly different Procrustes > distances than a global/GPA solution, but I have trouble believing that the > rank-order of distances could be drastically altered. I do believe we are > splitting hairs with this pedantic excursion. > > Speaking of splitting hairs, if someone should have a large data set and > wish to calculate pairwise Procrustes distances based on OPA rather than > those based on a global superimposition strategy (like GPA) after having > results from GPA (like with gpagen), below is a useful time-saving tool to > do this in R. I show this with the example Paolo provided: > > > system.time(pairwisedist <- paireucdist(plethodon$land)) # Paolo’s > function > > user system elapsed > 0.857 0.221 1.315 > > # Convergence of methods, using gpagen. > # First, assume GPA is performed > > > system.time(Y.gpa <-gpagen(plethodon$land, PrinAxes = FALSE, verbose = > TRUE, Proj = FALSE, print.progress = FALSE)) > user system elapsed > 0.007 0.000 0.007 > > # Note if we do GPA again, just with the first two specimens, and look at > the Procrustes distance > > > gpagen(Y.gpa$coords[,, 1:2], print.progress = FALSE, verbose = > TRUE)$procD > 1 > 2 0.06353302 > > # It’s the same as the paireucdist results > > > pairwisedist[1,] > res > 1.00000000 2.00000000 0.06353302 > > > # The following is a function to perform OPA for each pairwise result. > # It relies on configurations already having centroid size (no scaling > required) > # (Rescaling would be a simple step to add, if desired). > # It produces a distance matrix, like the dist function in R would. > > > pairwise.pd <- function(GPA){ > + dims <- dim(GPA$coords) > + k <- dims[2] > + p <- dims[1] > + n <- dims[3] > + ind <- combn(n, 2) > + > + res <- lapply(1:ncol(ind), function(j){ > + x <- ind[, j] > + Y1 <- GPA$coords[,, x[1]] > + Y2 <- GPA$coords[,, x[2]] > + V <- crossprod(Y1, Y2) > + sv <- La.svd(V, k, k) > + u <- sv$u; u[,k] <- u[,k]*determinant(V)$sign > + Yr <- tcrossprod(Y2, u %*% sv$vt) > + sqrt(sum((Y1 - Yr)^2)) > + }) > + > + res <- unlist(res) > + D <- as.dist(matrix(0, n, n)) > + D[1:length(res)] <- res > + return(D) > + } > > > > # Proof that it does the same thing > > > system.time(PD <- pairwise.pd(Y.gpa)) > user system elapsed > 0.044 0.003 0.054 > > > cbind(pairwisedist, PD)[1:10,] > res PD > [1,] 1 2 0.06353302 0.06353302 > [2,] 1 3 0.07727369 0.07727369 > [3,] 1 4 0.07374885 0.07374885 > [4,] 1 5 0.06326163 0.06326163 > [5,] 1 6 0.06412749 0.06412749 > [6,] 1 7 0.08359785 0.08359785 > [7,] 1 8 0.06413572 0.06413572 > [8,] 1 9 0.05978269 0.05978269 > [9,] 1 10 0.06478204 0.06478204 > [10,] 1 11 0.10952660 0.10952660 > > > all.equal(pairwisedist[,3], as.vector(PD)) > [1] TRUE > > The combination between gpagen and this method of calculating pairwise > Procrustes distance took about 0.06 seconds as opposed to the 1.3 seconds > required by the paireucdist function, which has more options, obviously. > Not a big deal here. However, if someone has a large data set (unlike this > simple one) on which GPA has already been performed, and coordinate data > have unit centroid size, relying on the quick OPA algorithm without scaling > configurations might take only seconds or minutes versus hours. (The same > function with a rescaling step increased the computation time by 0.03 > seconds for me. Thus, it is probably just faster because of having > performed GPA, already.) > > Warm regards! > > Mike > > On Oct 17, 2022, at 5:04 PM, Paolo Piras <[email protected]> wrote: > > Thanks Mike for some clarifications about gpagen(). > As for "pairwise" terminology...starting from the end of your message > "Whether the other n - 2 configurations do not matter might be interesting > to discuss.": as I said..it depends from the applications: in the case or > Parallel Transport, Geodesics, auto-parallel lines, Geodesics > splines...etc...and all application where OPA (Ordinary Procrustes > Analysis) between pairs of shapes matters ..one is forced to compute > pairwise Procrustes Distance (PD) at various steps of the above mentioned > procedures. PD is something that is always between two configurations: in > the case of GPA one is always the consensus and the others are real > observed shapes. Thus when you say "Second, the method Paolo is outlining > is not Procrustes distance from GPA" it must be noted that one can freely > compute "Euclidean distances between observed shapes after GPA alignment" > but the sole "Procrustes" Distances computed are those from real shapes and > the consensus while the others are merely (there is any diminutio here) > Euclidean distance. Thus....I'm just not sure they can be called > "Procrustes Distances" as PD intrinsically removes rotations via OPA. Of > course, numerically speaking, they are, in the majority of biological > applications, ludicrously close. > It is possible my argument suffers from some terminological purity > lack but the substance is there I guess. > "I wonder how one sees the forest from the trees?" I admit this pushed me > to do something I never did: I performed a Principal coordinates analysis > on the pairwise Procrustes Distance (via pairwise OPA); thus there is no > GPA, no consensus estimation. The resulting ordination space cannot be > closer to the pca ordination space after GPA alignment. Maybe this was > shown in old blue-book or white-book or Fred 1991 book but I'm not sure at > all. > Of course we cannot restore ANY shape from Principal Coordinates vectors > extremes. For that operation we need the mean shape coordinates, i.e. the > consensus of the GPA!. > Anyway..I think all this little discussion is useful and interesting, > independently from the fact that, operationally, does not change anything > in the common practice. > I always benefited from this kind of discussions as I always learned new > things in areas where (erroneously) I felt confident. > All the best > Paolo > > just append these lines to my previous code. > > if (!requireNamespace("BiocManager", quietly = TRUE)) > install.packages("BiocManager") > BiocManager::install("hopach") > library(hopach) > pairmat<-dissmatrix(pairwisedist[,3]) #### hopach was needed for using > this function... > library(ape) > ord<-pcoa(pairmat) > ord2<-procSym(plethodon$land) > par(mfrow=c(1,2)) > plot(ord2$PCscores[,1:2],asp=1) > plot(ord$vectors[,1:2],asp=1) > > > > > > > > > Il giorno lun 17 ott 2022 alle ore 19:30 Mike Collyer <[email protected]> > ha scritto: > >> For what it’s worth… >> >> I have neither tried to fully penetrate Paolo’s code nor the technical >> details in the paper he sent, but I did notice his code used the default of >> gpagen to project Procrustes residuals into tangent space. Therefore, the >> density plot results appear to confound methodological differences and >> whether configurations were projected into tangent space. Updating that >> portion of the code (not projecting Procrustes residuals) did not seem to >> resolve anything, and I would not expect it, since the Procrustes distances >> were measured among specimens after alignment to the mean configuration via >> GPA, not alignment between pairs of configurations. However, an >> interesting observation. The “distortion” between distances from the >> pairwise approach Paolo provided code for and the distances from the >> GPA-aligned configurations is on par with the distortion one would expect >> from projection into tangent space; i.e., so small in this empirical >> example as to be rounding error if using more than 6 decimal places. >> >> It is an interesting debatable point. Calculating Procrustes distances >> by pairs means that the same result will be found between pairs, >> irrespective of whether a full data set is expanded or subsetted (in which >> case the means could change, and thus, the Procrustes residuals could also >> change). Alternatively, without the mean, without projection into a >> tangent space, and without visualization of points in the space, perhaps >> with a mapping of shape change (transformation grids), I wonder how one >> sees the forest from the trees? >> >> Anyway, to confirm, yes, gpagen uses an “Euclidean fashion” to calculate >> distances based on Procrustes residuals that have been obtained via GPA and >> does not subset the data to all possible n(n - 1) / 2 pairs for n >> specimens, with an extra rotation step in each case. (I did confirm that >> after performing GPA with gpagen, and then performing it again on the >> Procrustes residuals obtained only for, e.g., specimens 1 and 2, specimens >> 1 and 3, …, I get the exact same results as obtained from Paolo’s code, >> meaning the pairwise approach is GPA performed on Procrustes residuals from >> GPA, by pairs.) So two important points from this. First, in gpagen, I >> should have been clear before that the Procrustes distances if calculated >> on projected residuals are actually Euclidean distances in the tangent >> space. Do not project if you wish to have Procrustes distances. Second, >> the method Paolo is outlining is not Procrustes distance from GPA. It is >> Procrustes distance based only on two shapes, which is tantamount to >> performing GPA, and subsequently realigning configurations as if all other >> n - 2 configurations do not matter. Whether the other n - 2 configurations >> do not matter might be interesting to discuss. >> >> Cheers! >> Mike >> >> >> >> On Oct 17, 2022, at 11:51 AM, Paolo Piras <[email protected]> wrote: >> >> It is very probable [correct me if I'm wrong, but see the last line of >> code below] that gpagen(), and other functions from other packages, >> computes pairwise distances in the Euclidean fashion after gpa alignment. >> While there is nothing wrong in the Euclidean fashion (in most cases) I >> think that these distances still contain rotations, albeit very small. In >> fact...aligned specimens are aligned with the consensus but not among them. >> At any pairwise comparison one should align each pair. In fact, the concept >> of "alignment" pertains to single pairs. This will have ***very little*** >> impact on the majority of cases and my message is just a technicality. >> However, if one wants to perform some kind of parallel transport in the >> shape space or size and shape space he/she should be aware about this. In >> particular, see section 3 in the attached paper. >> Let's try this with the help of an ad hoc written function (that allows >> both Euclidean/Kendall option as well as shape-space/size-and-shape-space >> option) that is a little bit slow but works... >> There are also some ancillary functions... >> All the best >> Paolo >> >> >> Just select all and run >> >> library(geomorph) >> library(Morpho) >> library(shapes) >> library(Matrix) >> data(plethodon) >> >> array2mat<-function(array,inds,vars){ >> >> if(class(array)=="matrix"){array<-array(array,dim=c(nrow(array),ncol(array),1))} >> X1 <-aperm(array,c(3,2,1)) >> dim(X1)<- c(inds, vars) >> >> if(!is.null(dimnames(array)[3])){rownames(X1)<-unlist(dimnames(array)[3])}else{rownames(X1)<-c(1:nrow(X1))} >> return(X1) >> } >> scaleshapes<-function(array){ >> k<-dim(array)[1] >> m<-dim(array)[2] >> n<-dim(array)[3] >> library(abind) >> library(shapes) >> library(Morpho) >> divmat<-function(mat){ >> mat2<-mat/cSize(mat) >> mat2 >> } >> >> if(is.matrix(array)==T){array<-array(array,dim=c(nrow(array),ncol(array),1))} >> scaled<-vecx(t(apply(array,3,divmat)),revert=T,lmdim=m) >> return(scaled) >> } >> >> centroids<-function(array){ >> meanmat<-function(mat){ >> mean<-apply(mat,2,mean) >> mean >> } >> >> if(class(array)[1]=="matrix"){array<-array(array,dim=c(dim(array)[1],dim(array)[2],1))} >> centroidi<-t(apply(array,3,meanmat)) >> return(centroidi) >> } >> >> centershapes<-function(array,vectors=NULL){ >> >> if(is.matrix(array)==T){array<-array(array,dim=c(nrow(array),ncol(array),1))} >> if(is.null(vectors)==T){centros<-centroids(array)}else{centros=vectors} >> k<-dim(array)[1] >> m<-dim(array)[2] >> n<-dim(array)[3] >> prov<-array(rep(t(centros),each=k),dim=c(k,m,n)) >> array2<-array-prov >> return(array2) >> } >> eucdist<-function(a,b,scale=T,center=T,doopa=T){ >> if(center==T){ >> a<-centershapes(a)[,,1] >> b<-centershapes(b)[,,1] >> } >> >> if(doopa==T){ >> theopa<-rotonto(a,b,scale=F,reflection=F) >> a<-a >> b<-theopa$yrot} >> >> k<-nrow(a) >> m<-ncol(a) >> >> if(scale==T){ >> a<-scaleshapes(array(a,dim=c(k,m,1)))[,,1] >> b<-scaleshapes(array(b,dim=c(k,m,1)))[,,1] >> } >> >> dist<-dist(rbind(array2mat(array(a,dim=c(k,m,1)),1,k*m),array2mat(array(b,dim=c(k,m,1)),1,k*m))) >> dist >> } >> >> >> paireucdist<-function(array,scale=T,doopa=T,euc=T,noneuc=c("kendall","ssriemdist")){ >> confs<-combn(dim(array)[3],2) >> if(!is.null(dimnames(array)[[3]])){ >> contrlist<-NULL >> for(i in 1:ncol(confs)){ >> >> contrveci<-paste(dimnames(array)[[3]][confs[,i][1]],dimnames(array)[[3]][confs[,i][2]],sep="/") >> contrlist<-c(contrlist,list(contrveci)) >> }} >> res<-NULL >> for(i in 1:ncol(confs)){ >> >> if(euc==T){resi<-eucdist(array[,,confs[,i][1]],array[,,confs[,i][2]],scale=scale,doopa=doopa)}else{ >> >> if(noneuc[1]=="kendall"){resi<-kendalldist(array[,,confs[,i][1]],array[,,confs[,i][2]])} >> >> if(noneuc[1]=="ssriemdist"){resi<-ssriemdist(array[,,confs[,i][1]],array[,,confs[,i][2]])} >> } >> res<-c(res,resi) >> print(paste(i,"of",length(confs)/2,sep=" ")) >> } >> >> if(is.null(dimnames(array)[[3]])){cbind(t(combn(dim(array)[3],2)),res)}else{ >> data.frame(unlist(contrlist),res,apply(t(confs),2,paste))} >> } >> Y.gpa <-gpagen(plethodon$land, PrinAxes = FALSE,verbose = T) >> mat<-Y.gpa$procD >> >> ##Euclidean fashion >> pairwisedist<-paireucdist(plethodon$land) >> plot(pairwisedist[,3],mat,asp=1) ##### of course they are very close in >> this case >> plot(density(pairwisedist[,3]/mat)) #### not identical >> ##Kendall fashion >> pairwisedist2<-paireucdist(plethodon$land,euc=F,noneuc="kendall") >> plot(pairwisedist2[,3],mat,asp=1) ##### of course they are very close in >> this case >> plot(density(pairwisedist2[,3]/mat)) #### not identical >> >> ### scholastic >> plot(density(pairwisedist2[,3]/pairwisedist[,3])) #### not identical >> >> >> ### >> mat2<-dist(array2mat(Y.gpa$coords,40,24)) >> range(mat2-mat) #### this confirms that "pairwise" distances in gpagen() >> are computed without re-aligning each pair. >> >> Il giorno lun 17 ott 2022 alle ore 16:49 Dagmawit Getahun < >> [email protected]> ha scritto: >> >>> Hi Mike, >>> >>> Thank you so much! This was very helpful and I was able to get the >>> distances. >>> >>> Best, >>> Dag >>> >>> On Sun, Oct 16, 2022 at 9:40 PM Mike Collyer <[email protected]> >>> wrote: >>> >>>> Dear Dag, >>>> >>>> When running the GPA with the gpagen function in geomorph, use the >>>> argument verbose = TRUE and it will include the Procrustes distances among >>>> specimens as output. This can take a while with large data sets, which is >>>> why it is not set as a default. >>>> >>>> Cheers! >>>> Mike >>>> >>>> Sent from my iPhone >>>> >>>> On Oct 16, 2022, at 9:33 PM, Dagmawit Getahun <[email protected]> >>>> wrote: >>>> >>>> Hi all, >>>> >>>> I have a dataset of 3D cranial landmarks in the Morphologika file >>>> format as a text file. I was able to upload it and run a GPA as well as PCA >>>> in R using geomorph and I wanted to know if there was an R code to analyze >>>> the actual Procrustes distances between the specimens in my dataset and >>>> whether I can do that in the format that my data set is in? >>>> >>>> Thanks in advance for the help! >>>> >>>> Best, >>>> Dag >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Morphmet" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/morphmet2/a9136b76-7691-4d80-9096-6c59afdfe8a4n%40googlegroups.com >>>> <https://groups.google.com/d/msgid/morphmet2/a9136b76-7691-4d80-9096-6c59afdfe8a4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>>> >>> >>> -- >>> Dagmawit Abebe Getahun >>> Doctoral Candidate >>> Physical Anthropology >>> The Graduate Center , CUNY >>> New York Consortium in Evolutionary Primatology >>> >>> Adjunct Lecturer, Anthropology Department >>> Lehman College, CUNY >>> >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "Morphmet" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/morphmet2/CALLJY8%2BejP-9F0-V-%3Dq4nSFADtBcbxb%3DOFVKhdLo2Y%3DE%2B7UJzA%40mail.gmail.com >>> <https://groups.google.com/d/msgid/morphmet2/CALLJY8%2BejP-9F0-V-%3Dq4nSFADtBcbxb%3DOFVKhdLo2Y%3DE%2B7UJzA%40mail.gmail.com?utm_medium=email&utm_source=footer> >>> . >>> >> <Varano et al BIOMAT.pdf> >> >> >> > -- You received this message because you are subscribed to the Google Groups "Morphmet" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/morphmet2/CAAdPgfPpBwaHVmZGaNrx0kLLtxWNpiS8SDTxGn2nfKT7un22zw%40mail.gmail.com.
