RE: multi-variate PGLS with GMM

morphmet_moderator Fri, 19 Apr 2013 19:25:42 -0700

----- Forwarded message from Paolo Piras <paolo.pi...@uniroma3.it> -----

Date: Wed, 17 Apr 2013 09:38:22 -0400
From: Paolo Piras <paolo.pi...@uniroma3.it>
Reply-To: Paolo Piras <paolo.pi...@uniroma3.it>
Subject: RE: multi-variate PGLS with GMM
To: "morphmet@morphometrics.org" <morphmet@morphometrics.org>

Hi all,

Assuming brownian motion, and dealing with MANY landmarks, i.e. face to the impossibility to apply parametric test of significance, a solution could be the same adopted in MorphoJ. It uses a completely randomized approach thus avoiding restrictions due to "fat data" (i.e. many vars, few cases). It is something like a "phylogenetic adonis"; adonis() can be found in vegan package.

Even if....... I cannot reproduce EXACTLY the same results of MorphoJ procedure in R using adonis and a phyl cov matrix; I think it is a problem of centering the data. Any advice for that point?

By the way I wrote a couple of functions (genpgls(), accompanied by another required function- testsix() - ), mainly inspired by that of Dean Adams, that allows a number of option in performing PGLS (univariate, multivariate, multiple or multiple-multivariate). For univariate Y it estimates the best mode of evoltion and transforms the tree accordingy before computing the covariance matrix. Any testing for amelioring the function is WELCOME.

best

paolo

#This function takes as arguments a tree,a dependent(s) table,an independent(s) table,a vector specifying the criterion
#upon which the tree is transformed (for univariate y only) and an integer indicating the permutations upon which the non
#phylogenetic correlation is tested for significance; it is useful when many variables enter the model.
#For univariate y the function calls fitContinuous() in order to find
#the most supported mode of evolution (basing on AICs) among BM", "OU","lambda", "kappa", "delta", "EB" and "white noise" models.
#This is done automatically using the function "testsix()" appended below. This is a mere translation in function of the example given in GEIGER manual. HERE SOMETIMES THE SCALING OF THE VARIABLE ## #INFLUENCES THE BEHAVIOUR OF fitcontinuous(). Muliplying or dividing by a proper constant solves the problem nearly always.

#By specifying the argument "mode", one can choose if the tree should be transformed according to the best mode
#of evolution found for y or for regression residuals of the linear model "y~x". This holds for univariate y. For multivariate Y one can write "no transf".
#For multivariate Y Brownian motion is assumed and the GLOBAL multivariate test (Wilks'lambda) is performed using PICs. Here still holds the problem of fat data (see above)

# Individual regressions of Ys on X(s) table are performed via PGLS.
#A number of results are printed. If y is univariate, the first ones regard the phylogenetic signal in y and in regression residuals (using both lambda and K criteria). Then the parameters and AICs of fitted models

# are printed followed by pgls and PICs analysis

genpgls<-function(tree,y,x,mode=c("use y","use regression residuals","no transf"),nperm){
require(phytools)
require(picante)
require(ape)
x<-as.matrix(match.phylo.data(tree,x)$data)
y<-as.matrix(match.phylo.data(tree,y)$data)
tree<-match.phylo.data(tree,x)$phy

if(dim(y)[2]<2){lmuniy<-lm(y~x)

phylosig_in_residuals<-phylosig(tree,resid(lmuniy),method="lambda",test=T)
phylosig_in_residuals2<-phylosig(tree,resid(lmuniy),method="K",test=T)

phylosig_in_y<-phylosig(tree,y,method="lambda",test=T)
phylosig_in_y2<-phylosig(tree,y,method="K",test=T)

print("PHYLOSIG IN Y USING LAMBDA")
print(phylosig_in_y)
print("PHYLOSIG IN Y USING K")
print(phylosig_in_y2)
print("PHYLOSIG IN RESIDUALS USING LAMBDA")
print(phylosig_in_residuals)
print("PHYLOSIG IN RESIDUALS USING K")
print(phylosig_in_residuals2)
}else{NULL}

if(dim(y)[2]<2&mode=="use y"){bestmodely<-testsix(tree,y)}else{vcvy<-vcv.phylo(tree)}
if(dim(y)[2]<2&mode=="use regression residuals"){bestmodely<-testsix(tree,resid(lmuniy))}else{vcvy<-vcv.phylo(tree)}

if(exists("bestmodely")==TRUE){
  if(bestmodely$AICs[1,1]>0&bestmodely$AICs[1,7]>0){
   modepositiony<-which(bestmodely$AICs-min(bestmodely$AICs)==0)
   if(modepositiony==2){
   treey<-lambdaTree(tree,bestmodely$parameters[modepositiony])
   vcvy<-vcv.phylo(treey)
   }else{NULL}
   if(modepositiony==3){
   treey<-deltaTree(tree,bestmodely$parameters[modepositiony])
   vcvy<-vcv.phylo(treey)
   }else{NULL}
   if(modepositiony==4){
   treey<-kappaTree(tree,bestmodely$parameters[modepositiony])
   vcvy<-vcv.phylo(treey)
   }else{NULL}
   if(modepositiony==5){
   treey<-ouTree(tree,bestmodely$parameters[modepositiony])
   vcvy<-vcv.phylo(treey)
   }else{NULL}
   if(modepositiony==6){
   treey<-eyponentialchangeTree(tree,bestmodely$parameters[modepositiony])
   vcvy<-vcv.phylo(treey)
   }else{NULL}
  }else{vcvy<-vcv.phylo(tree)}
}else{vcvy<-vcv.phylo(tree)}

ad<-adonis(y~as.matrix(x),method="euclidean",permutations=nperm)

int<-c(rep(1,nrow(x))) #### add a column of ones for intercept that will be subjected to transformation together with x table

mDnewy<-solve(svd(vcvy)$u%*%diag(sqrt(svd(vcvy)$d))%*%t(svd(vcvy)$u))

ynew<-mDnewy %*% y; xnew<-mDnewy%*%as.matrix(cbind(as.matrix(int),x)) ## I transform both x and intercept estimate

pgls<-lm(ynew~xnew-1)

summ<-summary(pgls)##### I SPECIFY "-1" BECAUSE lm() add a column of ones by default;
#####but we already added (and transformed) it; this pgls procedure estimates the intercept

if(exists("treey")==TRUE){an<-summary(lm(multpic(y,treey)~multpic(x,treey)-1))}else{an<-anova(lm(multpic(y,tree)~multpic(x,tree)-1))}

print("INDIVIDUAL CORRELATIONS UNDER PGLS")
print(summ)

if(exists("bestmodely")==TRUE){

if(mode==c("use y")){print("BEST MODEL PARAMETERS FOR Y")}else{print("BEST MODEL PARAMETERS FOR REGRESSION RESIDUALS")}
print(bestmodely)
}
else{NULL}

print("GLOBAL CORRELATION UNDER PICS")
print(an)
print("NON PHYLOGENETIC CORRELATION USING PERMUTATIONS")
print(ad)
}

testsix<-function(tree,x){
require(picante)
require(ape)
require(geiger)

x<-match.phylo.data(tree,x)$data
tree<-match.phylo.data(tree,x)$phy

brownfit<-fitContinuous(tree,x)
aic.brown<-numeric(1)
for(i in 1:1) aic.brown[i]<-brownfit[[i]]$aic

lambdaFit<-fitContinuous(tree, x, model="lambda")
d.lambda<-numeric(5)
for(i in 1:1) d.lambda[i]=2*(lambdaFit[[i]]$lnl-brownfit[[i]]$lnl)
# Calculate p values assuming chi-squared distribution with 1 d.f.
p.lambda=pchisq(d.lambda, 1, lower.tail=FALSE)
aic.lambda<-numeric(1)
for(i in 1:1) aic.lambda[i]<-lambdaFit[[i]]$aic

deltaFit<-fitContinuous(tree, x, model="delta")
d.delta<-numeric(1)
for(i in 1:1) d.delta[i]=2*(deltaFit[[i]]$lnl-brownfit[[i]]$lnl)
# Calculate p values assuming chi-squared distribution with 1 d.f.
p.delta=pchisq(d.delta, 1, lower.tail=FALSE)
aic.delta<-numeric(1)
for(i in 1:1) aic.delta[i]<-deltaFit[[i]]$aic

kappaFit<-fitContinuous(tree, x, model="kappa")
# Compare likelihoods:
d.kappa<-numeric(1)
for(i in 1:1) d.kappa[i]=2*(kappaFit[[i]]$lnl-brownfit[[i]]$lnl)
# Calculate p values assuming chi-squared distribution with 1 d.f.
p.kappa=pchisq(d.kappa, 1, lower.tail=FALSE)
aic.kappa<-numeric(1)
for(i in 1:1) aic.kappa[i]<-kappaFit[[i]]$aic

ouFit<-fitContinuous(tree, x, model="OU",bounds=list(alpha=c(0.0001, 10), beta=c(1, 100)))
# Compare likelihoods:
d.ou<-numeric(1)
for(i in 1:1) d.ou[i]=2*(ouFit[[i]]$lnl-brownfit[[i]]$lnl)
# Calculate p values assuming chi-squared distribution with 1 d.f.
p.ou=pchisq(d.ou, 1, lower.tail=FALSE)
aic.ou<-numeric(1)
for(i in 1:1) aic.ou[i]<-ouFit[[i]]$aic

ebFit<-fitContinuous(tree, x, model="EB",bounds=list(alpha=c(0.0001, 10), beta=c(1, 100)))
# Compare likelihoods:
d.eb<-numeric(1)
for(i in 1:1) d.eb[i]=2*(ebFit[[i]]$lnl-brownfit[[i]]$lnl)
# Calculate p values assuming chi-squared distribution with 1 d.f.
p.eb=pchisq(d.eb, 1, lower.tail=FALSE)
aic.eb<-numeric(1)
for(i in 1:1) aic.eb[i]<-ebFit[[i]]$aic

whiteFit<-fitContinuous(tree, x, model="white")
# Compare likelihoods:
d.white<-numeric(1)
for(i in 1:1) d.white[i]=2*(whiteFit[[i]]$lnl-brownfit[[i]]$lnl)
# Calculate p values assuming chi-squared distribution with 1 d.f.
p.white=pchisq(d.white, 1, lower.tail=FALSE)
aic.white<-numeric(1)
for(i in 1:1) aic.white[i]<-whiteFit[[i]]$aic

aic.all<-cbind(aic.brown, aic.lambda, aic.delta, aic.kappa, aic.ou, aic.eb,aic.white)
foo<-function(x) x-x[which(x==min(x))]
daic<-t(apply(aic.all, 1, foo))
rownames(daic)<-colnames(x)
colnames(daic)<-c("Brownian", "Lambda", "Delta", "Kappa", "OU", "EB","whitw")
cat("Table of delta-aic values; zero - best model\n")
print(daic, digits=2)

parameters<-cbind(NA,lambdaFit$Trait1$lambda,deltaFit$Trait1$delta,kappaFit$Trait1$lambda,ouFit$Trait1$alpha,ebFit$Trait1$a,NA)
colnames(parameters)<-c("Brown","lambda","delta","kappa","ou","eb","White")
print("PARAMETER ESTIMATES")
print(parameters)

print(aic.all)

out<-list(parameters=parameters,AICs=aic.all)
}

Da: morphmet_modera...@morphometrics.org [morphmet_modera...@morphometrics.org]
Inviato: mercoledì 17 aprile 2013 4.13
A: morphmet@morphometrics.org
Oggetto: Re: multi-variate PGLS with GMM

----- Forwarded message from Dean Adams <dcad...@iastate.edu> -----

Date: Mon, 15 Apr 2013 10:11:06 -0400
From: Dean Adams <dcad...@iastate.edu>
Reply-To: Dean Adams <dcad...@iastate.edu>
Subject: Re: multi-variate PGLS with GMM
To: morphmet@morphometrics.org

Blake,

Yes, one can account for phylogenetic non-independence while performing MANOVA or MANCOVA on shape variables. An early example is in Ruber and Adams 2001 (J. Evol. Biol.). That analysis was implemented in NTSYS.

One could also implement this in R. The standard approach is the use the 'gls' function, and provide it the correlation structure as specified by a phylogenetic covariance matrix. A univariate example is: summary(gls(Y~X, correlation=my.phy.cor))

Unfortunately, one cannot do the equivalent for multivariate data (summary(manova(gls ...))) because the 'manova' function in R does not utilize the correlation structure even if it provided. Thus, the phylogenetic and non-phylogenetic GLS yield the same numerical results.

To properly implement multivariate PGLS in R, one must first transform the X and Y data by the phylogeny under a Brownian motion model, using the phylogenetic transform procedure in Garland and Ives (2000). Then a standard multivariate MANCOVA may be used. I have written R code for this, which is in the supplemental material of a paper currently in press at Evolutionary Biology. The function is found below, along with the citation.

In your specific case, the input X-matrix would contain a column of ones, the ecological variables, and body size. Note that you must have more species than shape-dimensions for the algebra of this procedure to be completed. Also, one should rotate the aligned Procrustes coordinates to their principal component dimensions and eliminate the redundant dimensions due to superimposition to avoid singularities in the MANOVA procedure (7 in your case if you have a set of fixed, 3D landmarks).

Hope this helps.

Dean

--
Dr. Dean C. Adams
Professor
Department of Ecology, Evolution, and Organismal Biology
Department of Statistics
Iowa State University
Ames, Iowa
50011
www.public.iastate.edu/~dcadams/
phone: 515-294-3834

Outomuro, D., D.C. Adams, and F. Johansson. 2013. Evolution of wing shape in ornamented-winged damselflies. Evolutionary Biology. (In Press).

##MULTIVARIATE PLGS FUNCTION:   DC Adams. 2013
mult.pgls<-function(phy,y.mat,x.mat){
   phy.mat<-vcv(phy)
   x.mat<-cbind(matrix(1,length(phy$tip.label)),x.mat)
   x.mat<-x.mat[rownames(phy.mat),]
   y.mat<-y.mat[rownames(phy.mat),]
   mDnew<-solve(svd(phy.mat)$u%*%diag(sqrt(svd(phy.mat)$d))%*%t(svd(phy.mat)$u))
   ynew<-mDnew %*% y.mat
   xnew<-mDnew %*% x.mat
   summary=summary(manova(lm(ynew~xnew-1)))
   return(list(summary=summary))
}

On 4/12/2013 11:23 PM, morphmet_modera...@morphometrics.org wrote:

----- Forwarded message from Blake Dickson <b.dick...@unsw.edu.au> -----

     Date: Thu, 11 Apr 2013 00:58:04 -0400
      From: Blake Dickson <b.dick...@unsw.edu.au>
      Reply-To: Blake Dickson <b.dick...@unsw.edu.au>
      Subject: multi-variate PGLS with GMM
      To: "morphmet@morphometrics.org"

Hi All,

 I have a set of 3D procrustes co-ordinates for a bunch of taxa I would like to
test against ecological characteristics. I am wondering whether it is possible
to perform a multivariate PGLS - controlling for both phylogeny and body size,
and if so what is the best way to go about it. 

 Cheers,
 Blake

 Blake Dickson, School of BEES
 University of New South Wales,
 NSW, Australia
 b.dick...@unsw.edu.au

----- End forwarded message -----

----- End forwarded message -----

RE: multi-variate PGLS with GMM

Reply via email to