Re: [R-sig-phylo] A perfect storm: phylogenetic trees, random effects and zero-inflated binomial data

2015-10-14 Thread Jarrod Hadfield

Dear Diederik,

The lack of convergence is because the residual variance is  
non-identifiable with binary data but you have a very weak prior on  
it. You should fix the residual variance at something (I usually use 1):


prior.test<-list(R=list(V=1,fix=1), G=list(G1=list(V=1, nu=0.002),G2 =  
list(V=1, nu=0.002)))


You might also want to consider using "threshold" rather than  
"categorical". The model is similar but the first uses a probit link  
and the second a logit link, and the MCMC algorithm is more efficient  
for the former. I would also advise updating to version 2.22 as it now  
performs better for phylogenetic models.


You may still find that problems persist with rare-event data, but get  
back to the list if this is the case after fixing the prior.


It has been argued that the h2 (or lambda if you prefer) for  
categorical traits should be fixed at one rather than estimated. If  
you did want to do this (I am not convinced it is always a reasonable  
thing to do) then you can use MCMCglmmRAM  
(http://jarrod.bio.ed.ac.uk/MCMCglmmRAM_2.22.tar.gz).


Cheers,

Jarrod








Quoting Diederik Strubbe  on Wed, 14  
Oct 2015 16:18:28 +0200:



Dear all,



 A while ago, I was kindly advised to try MCMCglmm to investigate
invasion success of non-native species while accounting for phylogenetic
relatedness. I have managed to run some explanatory models but stumble
upon converge problems…



The data are (1) /introduction events/ of non-native species. There are
about 4.000 introduction events, but only about 40 have resulted in an
established population – so there is a low number of ‘events’.
Successful introductions are coded as “1”,  failure is “0”, (2) the
/phylogenetic tree/ is an ultrametric majority-rule consensus tree
(class “phylo”,  number of tips: 359, number of nodes: 298), (3)
/country/ in which species are introduced is included as a random
effect, (4) a set of continuous /explanatory variables/.



I have mainly explored univariate models (ie one explanatory variables
at a time), but including multiple explanatory variables also results in
converge problems. I use the following model, with continuous variable
‘CloseCentralInv’ as single explanatory variable.



prior.test<-list(R=list(V=1,nu=0.002), G=list(G1=list(V=1, nu=0.002),G2
= list(V=1, nu=0.002)))

test.phylo1 <- MCMCglmm(invasiveStatus ~ CloseCentralInv,

random = ~species + country,


ginverse=list(species=inv.mytree$Ainv),

nitt=100, thin=500, burnin=1,

family = "categorical",

data = data.trade,

prior = prior.test,

verbose = FALSE)



As far as I understand, the basic model output looks reasonable (here
,
1), although effective sample sizes for the random effects seem to be
small.



However, /Heidelberger and Welch's/ convergence diagnostic often passes
the stationary test, but not the Halfway mean test (example here
, 2).
/Gelman and Rubin/'s convergence diagnostic (calculated on two chains)
indicates potential scale reduction factors that are often well above 1
(example here
, 3). A
plot of Gelman and Rubin also does not clearly indicate after how many
iterations convergence is to be expected (example here, 4). I tried to
use Raftery and Lewis's diagnostic to estimate the number of necessary
iterations, but this invariably outputs “You need a sample size of at
least 3746 with these values of q, r and s”. Traces for this model run
can be accessed here
 (5).



I tried upping the number of iterations to 1.000.000 for a number of
variables, but this does not seem to ameliorate the convergence problems.



I can see the following causes: (1) I misspecified the model and it is
not doing what I think it should be doing, (2) I actually need much more
iterations, (3) I need to specify stronger priors, (4) I am trying to
get blood from stone (aka the data do not allow such an analysis).



I might add that if I look at the p-values and estimates obtained
through the (non-converging- MCMCglmm runs, the results are very much
similar to a ‘simpler’ glmm where phylogeny is accounted for by using a
nested random effect for taxon and genus.



Any suggestions on how to proceed are much appreciated.



Best wishes and thanks in advance,



Diederik





1.
https://www.dropbox.com/s/kffj8o3nf9nz0xn/summary%28test.phylo1%29.png?dl=0

2.   https://www.dropbox.com/s/t25y37slvkqyd8l/heidel.diag.png?dl=0

3.   https://www.dropbox.com/s/kbxrc73ep1kjli5/gelman.diag.png?dl=0

4.   

Re: [R-sig-phylo] Question on pCCA (phyl.cca)

2015-10-14 Thread sandra goutte
Dear Liam and Julien,

Thank you both for your help. I have tried both solutions on the same
dataset under a BM model to compare them. I expected to get similar
results, at least for the general conclusions on the significance of
correlations.

for Julien's solution, I ran:

model<-mvBM(tree, data, model="BM1",method="pic")
cov2cor(model$sigma)

# bootstrapping to get p-values
data_boot <- simulate(model, tree=tree, nsim=100)
model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x,
model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) })
correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] }))
p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })

# the correlations and associated p-values
correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] )
res<-rbind(correl, p_values)


For Liam's solution, I ran:

corbr<-corBrownian(phy=tree)

glsp1<- gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4,
correlation=corbr, data=data)
glsp2<- gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4,
correlation=corbr, data=data)
glsp3<- gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4,
correlation=corbr, data=data)
glsp4<- gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3,
correlation=corbr, data=data)


I'm getting much more significant correlations with pGLS than with mvBM,
and the correlation coefficients are sometimes very different as well. The
results I get from the pGLS are generally more intuitive biologically. For
example, for the dominant frequency of calls:


DF~body.size  with mvBM: -0.29 p-value=0.92 and with pGLS:  -0.26
p-value=0.016. Here the coefficients are close, but the significance is
very different. Biologically, we expect a significant correlation between
these two variables, so it is quite surprising to have such a high p-value
for the mvBM method.

DF~env.var1  with mvBM:0.35 p-value=0.03 and with pGLS: 0.021 p-value=
0.09. Here the coefficients are really quite different...

Is it normal that I'm getting such different results? Did I miss something?

Best,
Sandra.







2015-10-13 15:18 GMT-03:00 Liam J. Revell :

> Hi Sandra.
>
> If you are interested in the individual correlations, why not just do a
> PGLS in which (on at a time) the variables of interest are set as the
> independent variable.
>
> So, for instance, to test "is environemental variable A correlated with
> acoustic variable Z" (controlling for all other environmental variables &
> acoustic variables), one would do:
>
> acoustic variable Z ~ environmental variable A + all other environmental
> variable + all other acoustic variables + phylogenetic residual error
>
> If you have collinearity, then you may not be able distinguish between the
> effects of different variables - but this will be true for all statistical
> models. (Imagine the worst case in which variables A & B were perfectly
> collinear, well then it becomes theoretically impossible to determine if A
> is affecting Z or B, no matter what we do.)
>
> All the best, Liam
>
> Liam J. Revell, Associate Professor of Biology
> University of Massachusetts Boston
> web: http://faculty.umb.edu/liam.revell/
> email: liam.rev...@umb.edu
> blog: http://blog.phytools.org
>
>
> On 10/13/2015 5:47 AM, Julien Clavel wrote:
>
>> Hi Sandra,
>>
>> I think, a multivariate correlational model with your continuous traits
>> as response variables can be used to do what you want.
>> For instance, assuming Brownian motion:
>>
>>
>> library(mvMORPH)
>> set.seed(123)
>>
>> ## First simulate some data (3 traits)
>> # simulate a tree
>> tree <- pbtree(n=100)
>>
>> # Define a correlation matrix
>> C <- matrix(c(1,0.7,0.3,0.7,1,0.3,0.3,0.3,1),3)
>>
>> # Define the variance to build a covariance matrix
>> D <- diag(c(sqrt(0.3),sqrt(0.2),sqrt(0.1)),3)
>>
>> # Compute the covariance matrix
>> sigma <- D%*%C%*%D
>>
>> # Ancestral states
>> theta <- c(0,0,0)
>>
>> # simulate the traits
>> data<-mvSIM(tree, param=list(sigma=sigma, ntraits=3, mu=theta,
>> names_traits=c("Trait 1","Trait 2", "Trait 3")),
>> model="BM1", nsim=1)
>>
>>
>>
>>
>> ## Now we fit the multivariate brownian model
>> model<-mvBM(tree, data, model="BM1",method="pic")
>>
>> # Obtain the traits correlations (very close to the generating values):
>> cov2cor(model$sigma)
>>
>> # You can compare the fit of a model with independent Brownian motion
>> model_2<-mvBM(tree, data, model="BM1",method="pic",
>> param=list(constraint="diagonal"))
>>
>> # Test the significance with an LRT test
>> LRT(model,model_2)
>>
>>
>>
>>
>> # But if you want to assess the significance for each traits when there
>> is more than two traits, use bootstrap instead:
>>
>> # Simulate data under the MLE and then fit the model to simulated traits
>> data_boot <- simulate(model, tree=tree, nsim=100)
>> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x,
>> model="BM1", method="pic", echo=FALSE, 

Re: [R-sig-phylo] Question on pCCA (phyl.cca)

2015-10-14 Thread Julien Clavel
Hi Sandra,

Maybe I choose a bad example to illustrate the approach. In fact I simulated 
positive correlations so the way the p-value was computed was for illustrative 
purpose and is wrong for negative correlations (we need a two-sided test). In 
fact you must look whether your 95% confidence interval for the parameter 
include 0 or not.

Just replace:
p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) 

by something like that (if I'm not wrong)

pvalue <- function(x){
   neg <- mean(x>0)
   pos <- mean(x<0)
   pval <- min(neg,pos)*2
  return(pval)
}

p_values <- apply(correl_boot,2, pvalue) 

This should fix the problem.

Julien


> Date: Wed, 14 Oct 2015 11:35:43 -0300 
> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) 
> From: gou...@mnhn.fr 
> To: liam.rev...@umb.edu 
> CC: julien.cla...@hotmail.fr; r-sig-phylo@r-project.org 
> 
> Dear Liam and Julien, 
> 
> Thank you both for your help. I have tried both solutions on the same 
> dataset under a BM model to compare them. I expected to get similar 
> results, at least for the general conclusions on the significance of 
> correlations. 
> 
> for Julien's solution, I ran: 
> 
> model<-mvBM(tree, data, model="BM1",method="pic") 
> cov2cor(model$sigma) 
> 
> # bootstrapping to get p-values 
> data_boot <- simulate(model, tree=tree, nsim=100) 
> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, 
> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) 
> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) 
> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) 
> 
> # the correlations and associated p-values 
> correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] ) 
> res<-rbind(correl, p_values) 
> 
> 
> For Liam's solution, I ran: 
> 
> corbr<-corBrownian(phy=tree) 
> 
> glsp1<- 
> gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4, 
> correlation=corbr, data=data) 
> glsp2<- 
> gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4, 
> correlation=corbr, data=data) 
> glsp3<- 
> gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4, 
> correlation=corbr, data=data) 
> glsp4<- 
> gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3, 
> correlation=corbr, data=data) 
> 
> 
> I'm getting much more significant correlations with pGLS than with 
> mvBM, and the correlation coefficients are sometimes very different as 
> well. The results I get from the pGLS are generally more intuitive 
> biologically. For example, for the dominant frequency of calls: 
> 
> 
> DF~body.size with mvBM: -0.29 p-value=0.92 and with pGLS: -0.26 
> p-value=0.016. Here the coefficients are close, but the significance is 
> very different. Biologically, we expect a significant correlation 
> between these two variables, so it is quite surprising to have such a 
> high p-value for the mvBM method. 
> 
> DF~env.var1 with mvBM:0.35 p-value=0.03 and with pGLS: 0.021 p-value= 
> 0.09. Here the coefficients are really quite different... 
> 
> Is it normal that I'm getting such different results? Did I miss something? 
> 
> Best, 
> Sandra. 
> 
> 
> 
> 
> 
> 
> 
> 2015-10-13 15:18 GMT-03:00 Liam J. Revell 
> >: 
> Hi Sandra. 
> 
> If you are interested in the individual correlations, why not just do a 
> PGLS in which (on at a time) the variables of interest are set as the 
> independent variable. 
> 
> So, for instance, to test "is environemental variable A correlated with 
> acoustic variable Z" (controlling for all other environmental variables 
> & acoustic variables), one would do: 
> 
> acoustic variable Z ~ environmental variable A + all other 
> environmental variable + all other acoustic variables + phylogenetic 
> residual error 
> 
> If you have collinearity, then you may not be able distinguish between 
> the effects of different variables - but this will be true for all 
> statistical models. (Imagine the worst case in which variables A & B 
> were perfectly collinear, well then it becomes theoretically impossible 
> to determine if A is affecting Z or B, no matter what we do.) 
> 
> All the best, Liam 
> 
> Liam J. Revell, Associate Professor of Biology 
> University of Massachusetts Boston 
> web: http://faculty.umb.edu/liam.revell/ 
> email: liam.rev...@umb.edu 
> blog: http://blog.phytools.org 
> 
> 
> On 10/13/2015 5:47 AM, Julien Clavel wrote: 
> Hi Sandra, 
> 
> I think, a multivariate correlational model with your continuous traits 
> as response variables can be used to do what you want. 
> For instance, assuming Brownian motion: 
> 
> 
> library(mvMORPH) 
> set.seed(123) 
> 
> ## First simulate some data (3 traits) 
> # simulate a tree 
> tree <- pbtree(n=100) 
> 
> # Define a correlation matrix 
> C <- matrix(c(1,0.7,0.3,0.7,1,0.3,0.3,0.3,1),3) 
> 
> # Define the variance to build a covariance matrix 
> D <- 

Re: [R-sig-phylo] A perfect storm: phylogenetic trees, random effects and zero-inflated binomial data

2015-10-14 Thread Diederik Strubbe
Dear all,

 

 A while ago, I was kindly advised to try MCMCglmm to investigate
invasion success of non-native species while accounting for phylogenetic
relatedness. I have managed to run some explanatory models but stumble
upon converge problems…

 

The data are (1) /introduction events/ of non-native species. There are
about 4.000 introduction events, but only about 40 have resulted in an
established population – so there is a low number of ‘events’.
Successful introductions are coded as “1”,  failure is “0”, (2) the
/phylogenetic tree/ is an ultrametric majority-rule consensus tree
(class “phylo”,  number of tips: 359, number of nodes: 298), (3)
/country/ in which species are introduced is included as a random
effect, (4) a set of continuous /explanatory variables/.

 

I have mainly explored univariate models (ie one explanatory variables
at a time), but including multiple explanatory variables also results in
converge problems. I use the following model, with continuous variable
‘CloseCentralInv’ as single explanatory variable.

 

prior.test<-list(R=list(V=1,nu=0.002), G=list(G1=list(V=1, nu=0.002),G2
= list(V=1, nu=0.002)))

test.phylo1 <- MCMCglmm(invasiveStatus ~ CloseCentralInv,

random = ~species + country,

   
ginverse=list(species=inv.mytree$Ainv),

nitt=100, thin=500, burnin=1,

family = "categorical",

data = data.trade,

prior = prior.test,

verbose = FALSE)

 

As far as I understand, the basic model output looks reasonable (here
,
1), although effective sample sizes for the random effects seem to be
small.

 

However, /Heidelberger and Welch's/ convergence diagnostic often passes
the stationary test, but not the Halfway mean test (example here
, 2).
/Gelman and Rubin/'s convergence diagnostic (calculated on two chains)
indicates potential scale reduction factors that are often well above 1
(example here
, 3). A
plot of Gelman and Rubin also does not clearly indicate after how many
iterations convergence is to be expected (example here, 4). I tried to
use Raftery and Lewis's diagnostic to estimate the number of necessary
iterations, but this invariably outputs “You need a sample size of at
least 3746 with these values of q, r and s”. Traces for this model run
can be accessed here
 (5).

 

I tried upping the number of iterations to 1.000.000 for a number of
variables, but this does not seem to ameliorate the convergence problems.

 

I can see the following causes: (1) I misspecified the model and it is
not doing what I think it should be doing, (2) I actually need much more
iterations, (3) I need to specify stronger priors, (4) I am trying to
get blood from stone (aka the data do not allow such an analysis).

 

I might add that if I look at the p-values and estimates obtained
through the (non-converging- MCMCglmm runs, the results are very much
similar to a ‘simpler’ glmm where phylogeny is accounted for by using a
nested random effect for taxon and genus.

 

Any suggestions on how to proceed are much appreciated.

 

Best wishes and thanks in advance,

 

Diederik

 

 

1.  
https://www.dropbox.com/s/kffj8o3nf9nz0xn/summary%28test.phylo1%29.png?dl=0

2.   https://www.dropbox.com/s/t25y37slvkqyd8l/heidel.diag.png?dl=0

3.   https://www.dropbox.com/s/kbxrc73ep1kjli5/gelman.diag.png?dl=0

4.   https://www.dropbox.com/s/0e2unw7rmr7rt2y/gelman.plot.png?dl=0

5.   https://www.dropbox.com/s/7vvqf0r7she71hu/traces.png?dl=0

 




On 6/3/2015 5:37 PM, Jörg Albrecht wrote:
> Hi Diederik,
>
> you can use MCMCglmm. The package allows for inclusion of phylogenetic
> information, random effects and zero-inflated response variables.
> However, it may take some time to get familiar with the package.
>
> Best,
>
> J
>
>
> —
> Jörg Albrecht, PhD
> Postdoctoral researcher
> Institute of Nature Conservation
> Polish Academy of Sciences
> Mickiewicza 33
> 31-120 Krakow, Poland
> www.carpathianbear.pl 
> www.globeproject.pl 
> www.iop.krakow.pl 
>
>> Am 03.06.2015 um 17:25 schrieb Diederik Strubbe
>> >:
>>
>> Dear all,
>>
>>
>>
>> I am struggling with analysing a dataset aimed at explaining invasion
>> success of non-native species. At a country level, I need to relate
>> invasion success (binomial: 0 for failed invasions, 1 for success) to
>> socio-economic variables, taking into account
>>
>> -  Phylogenetic relatedness among 

Re: [R-sig-phylo] Question on pCCA (phyl.cca)

2015-10-14 Thread Julien Clavel
Sandra,

I just figured out, are you directly comparing the coefficients (slopes) of the 
multiple regression analysis with the evolutionary correlations computed with 
mvBM?

These are two distincts things although they are related (same sign)! For 
instance, while correlations (and evolutionary correlations) range between -1 
and 1, the range of the slope can take higher values.

Best,

Julien 


> From: julien.cla...@hotmail.fr
> To: gou...@mnhn.fr; liam.rev...@umb.edu
> Date: Wed, 14 Oct 2015 17:50:37 +0200
> CC: r-sig-phylo@r-project.org
> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca)
>
> Hi Sandra,
>
> Maybe I choose a bad example to illustrate the approach. In fact I simulated 
> positive correlations so the way the p-value was computed was for 
> illustrative purpose and is wrong for negative correlations (we need a 
> two-sided test). In fact you must look whether your 95% confidence interval 
> for the parameter include 0 or not.
>
> Just replace:
> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })
>
> by something like that (if I'm not wrong)
>
> pvalue <- function(x){
> neg <- mean(x>0)
> pos <- mean(x<0)
> pval <- min(neg,pos)*2
> return(pval)
> }
>
> p_values <- apply(correl_boot,2, pvalue)
>
> This should fix the problem.
>
> Julien
>
> 
>> Date: Wed, 14 Oct 2015 11:35:43 -0300
>> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca)
>> From: gou...@mnhn.fr
>> To: liam.rev...@umb.edu
>> CC: julien.cla...@hotmail.fr; r-sig-phylo@r-project.org
>>
>> Dear Liam and Julien,
>>
>> Thank you both for your help. I have tried both solutions on the same
>> dataset under a BM model to compare them. I expected to get similar
>> results, at least for the general conclusions on the significance of
>> correlations.
>>
>> for Julien's solution, I ran:
>>
>> model<-mvBM(tree, data, model="BM1",method="pic")
>> cov2cor(model$sigma)
>>
>> # bootstrapping to get p-values
>> data_boot <- simulate(model, tree=tree, nsim=100)
>> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x,
>> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) })
>> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] }))
>> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })
>>
>> # the correlations and associated p-values
>> correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] )
>> res<-rbind(correl, p_values)
>>
>>
>> For Liam's solution, I ran:
>>
>> corbr<-corBrownian(phy=tree)
>>
>> glsp1<-
>> gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4,
>> correlation=corbr, data=data)
>> glsp2<-
>> gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4,
>> correlation=corbr, data=data)
>> glsp3<-
>> gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4,
>> correlation=corbr, data=data)
>> glsp4<-
>> gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3,
>> correlation=corbr, data=data)
>>
>>
>> I'm getting much more significant correlations with pGLS than with
>> mvBM, and the correlation coefficients are sometimes very different as
>> well. The results I get from the pGLS are generally more intuitive
>> biologically. For example, for the dominant frequency of calls:
>>
>>
>> DF~body.size with mvBM: -0.29 p-value=0.92 and with pGLS: -0.26
>> p-value=0.016. Here the coefficients are close, but the significance is
>> very different. Biologically, we expect a significant correlation
>> between these two variables, so it is quite surprising to have such a
>> high p-value for the mvBM method.
>>
>> DF~env.var1 with mvBM:0.35 p-value=0.03 and with pGLS: 0.021 p-value=
>> 0.09. Here the coefficients are really quite different...
>>
>> Is it normal that I'm getting such different results? Did I miss something?
>>
>> Best,
>> Sandra.
>>
>>
>>
>>
>>
>>
>>
>> 2015-10-13 15:18 GMT-03:00 Liam J. Revell
>> >:
>> Hi Sandra.
>>
>> If you are interested in the individual correlations, why not just do a
>> PGLS in which (on at a time) the variables of interest are set as the
>> independent variable.
>>
>> So, for instance, to test "is environemental variable A correlated with
>> acoustic variable Z" (controlling for all other environmental variables
>> & acoustic variables), one would do:
>>
>> acoustic variable Z ~ environmental variable A + all other
>> environmental variable + all other acoustic variables + phylogenetic
>> residual error
>>
>> If you have collinearity, then you may not be able distinguish between
>> the effects of different variables - but this will be true for all
>> statistical models. (Imagine the worst case in which variables A & B
>> were perfectly collinear, well then it becomes theoretically impossible
>> to determine if A is affecting Z or B, no matter what we do.)
>>
>> All the best, Liam
>>
>> Liam J. Revell, Associate Professor of Biology
>> University of Massachusetts Boston
>> web: 

Re: [R-sig-phylo] Question on pCCA (phyl.cca)

2015-10-14 Thread sandra goutte
Oi Julien,

Yes you are right, I was comparing things that are not the same! However,
even changing the p-values as you suggested, the conclusions (which is
really what i'm interested in) are still much different between the two
methods. For the example I gave above, the correlation is still not
significant (p-value = 0.16).

Overall i get much less significant correlations with this method compared
to pGLS. Is there an explanation? Could it be a problem of
over-parametrisation?

Best,
Sandra.


2015-10-14 13:37 GMT-03:00 Julien Clavel :

> Sandra,
>
> I just figured out, are you directly comparing the coefficients (slopes)
> of the multiple regression analysis with the evolutionary correlations
> computed with mvBM?
>
> These are two distincts things although they are related (same sign)! For
> instance, while correlations (and evolutionary correlations) range between
> -1 and 1, the range of the slope can take higher values.
>
> Best,
>
> Julien
>
> 
> > From: julien.cla...@hotmail.fr
> > To: gou...@mnhn.fr; liam.rev...@umb.edu
> > Date: Wed, 14 Oct 2015 17:50:37 +0200
> > CC: r-sig-phylo@r-project.org
> > Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca)
> >
> > Hi Sandra,
> >
> > Maybe I choose a bad example to illustrate the approach. In fact I
> simulated positive correlations so the way the p-value was computed was for
> illustrative purpose and is wrong for negative correlations (we need a
> two-sided test). In fact you must look whether your 95% confidence interval
> for the parameter include 0 or not.
> >
> > Just replace:
> > p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })
> >
> > by something like that (if I'm not wrong)
> >
> > pvalue <- function(x){
> > neg <- mean(x>0)
> > pos <- mean(x<0)
> > pval <- min(neg,pos)*2
> > return(pval)
> > }
> >
> > p_values <- apply(correl_boot,2, pvalue)
> >
> > This should fix the problem.
> >
> > Julien
> >
> > 
> >> Date: Wed, 14 Oct 2015 11:35:43 -0300
> >> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca)
> >> From: gou...@mnhn.fr
> >> To: liam.rev...@umb.edu
> >> CC: julien.cla...@hotmail.fr; r-sig-phylo@r-project.org
> >>
> >> Dear Liam and Julien,
> >>
> >> Thank you both for your help. I have tried both solutions on the same
> >> dataset under a BM model to compare them. I expected to get similar
> >> results, at least for the general conclusions on the significance of
> >> correlations.
> >>
> >> for Julien's solution, I ran:
> >>
> >> model<-mvBM(tree, data, model="BM1",method="pic")
> >> cov2cor(model$sigma)
> >>
> >> # bootstrapping to get p-values
> >> data_boot <- simulate(model, tree=tree, nsim=100)
> >> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x,
> >> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) })
> >> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] }))
> >> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) })
> >>
> >> # the correlations and associated p-values
> >> correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] )
> >> res<-rbind(correl, p_values)
> >>
> >>
> >> For Liam's solution, I ran:
> >>
> >> corbr<-corBrownian(phy=tree)
> >>
> >> glsp1<-
> >> gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4,
> >> correlation=corbr, data=data)
> >> glsp2<-
> >> gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4,
> >> correlation=corbr, data=data)
> >> glsp3<-
> >> gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4,
> >> correlation=corbr, data=data)
> >> glsp4<-
> >> gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3,
> >> correlation=corbr, data=data)
> >>
> >>
> >> I'm getting much more significant correlations with pGLS than with
> >> mvBM, and the correlation coefficients are sometimes very different as
> >> well. The results I get from the pGLS are generally more intuitive
> >> biologically. For example, for the dominant frequency of calls:
> >>
> >>
> >> DF~body.size with mvBM: -0.29 p-value=0.92 and with pGLS: -0.26
> >> p-value=0.016. Here the coefficients are close, but the significance is
> >> very different. Biologically, we expect a significant correlation
> >> between these two variables, so it is quite surprising to have such a
> >> high p-value for the mvBM method.
> >>
> >> DF~env.var1 with mvBM:0.35 p-value=0.03 and with pGLS: 0.021 p-value=
> >> 0.09. Here the coefficients are really quite different...
> >>
> >> Is it normal that I'm getting such different results? Did I miss
> something?
> >>
> >> Best,
> >> Sandra.
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >> 2015-10-13 15:18 GMT-03:00 Liam J. Revell
> >> >:
> >> Hi Sandra.
> >>
> >> If you are interested in the individual correlations, why not just do a
> >> PGLS in which (on at a time) the variables of interest are set as the
> >> independent variable.
> >>
> >> So, for instance,