Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Hi Krzysztof, I agree with you that if you want to test explicitly adaptation in a regression framework (i.e. environmental variables as predictors), slouch/mvslouch is more sounding than doing a simple regression ((p)GLS) or multiple regression. I emphasize here that the coefficients of a regression model and the correlations are really distincts. It's because all is about the nature of the variation in the response variable in a regression model (and it's why that changing the variables in a multiple regression assume different nature of errors terms). The initial question was on how to obtain the correlations between the traits, and possibly their significance, while CCA provides only the correlation between the traits sets. The significance of the correlations computed through multivariate approaches may be impeded by the number of parameters to estimate/sample size (a power issue), but in multiple regression setting if there is correlated independent variables the standard errors will be impacted and thus the p-values (in a more misleading way). It's also true that, as Krzystof pointed out, BM may not be the best evolutionary model for describing the correlation in the residuals. However the number of parameters to estimate will be even greater (for OU multivariate model such as in mvSLOUCH), and we may lack power in the significance tests of the correlations, no? Julien > Date: Sat, 17 Oct 2015 14:09:25 +0200 > From: krzysztofbartos...@wp.pl > To: r-sig-phylo@r-project.org > CC: r-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) > > Hi Sandra, Julien and Liam! > One thought that I had is since you have an environmental variable and one > supposedly correlated with it is that then the slouch/mvSLOUCH (on CRAN) > models could be a possibility? In them you can have the environmental > variable(s) evolving as a Brownian motion (i.e. random drift, no selecetion) > and then the responding one(s) as an OU whose primary optimum depends on the > state of the environment. As far as I followed from the discussion you had > either both environment and responding trait > following a Brownian motion (meaning there is correlation between them but > not adaptation) or both following an OU process (meaning that the environment > is trying to adapt to something which might not be justified biologically). > > Sandra if you think such a setup - a trait adapting to a primary optimum > defined by a randomly evolving environment (this is not the same as being > correlated with it) is something useful for you I can help you out with using > mvSLOUCH. > > You can have a look at: > * Hansen, T.F.,Pienaar,J.,Orzack,S.H.,2008. A comparative method for studying > adaptation to a randomly evolving environment.Evolution 62,19651977. > (for slouch model) > * Bartoszek, K.,Pienaar,J.,Mostad,P.,Andersson,S.,Hansen,T.F.,2012. A > phylogenetic comparative method for studying multivariate adaptation > .J.Theor.Biol. 314,204215. (for mvSLOUCH model) > > Best wishes > Krzysztof > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Hi Sandra, Julien and Liam! One thought that I had is since you have an environmental variable and one supposedly correlated with it is that then the slouch/mvSLOUCH (on CRAN) models could be a possibility? In them you can have the environmental variable(s) evolving as a Brownian motion (i.e. random drift, no selecetion) and then the responding one(s) as an OU whose primary optimum depends on the state of the environment. As far as I followed from the discussion you had either both environment and responding trait following a Brownian motion (meaning there is correlation between them but not adaptation) or both following an OU process (meaning that the environment is trying to adapt to something which might not be justified biologically). Sandra if you think such a setup - a trait adapting to a primary optimum defined by a randomly evolving environment (this is not the same as being correlated with it) is something useful for you I can help you out with using mvSLOUCH. You can have a look at: * Hansen, T.F.,Pienaar,J.,Orzack,S.H.,2008. A comparative method for studying adaptation to a randomly evolving environment.Evolution 62,19651977. (for slouch model) * Bartoszek, K.,Pienaar,J.,Mostad,P.,Andersson,S.,Hansen,T.F.,2012. A phylogenetic comparative method for studying multivariate adaptation .J.Theor.Biol. 314,204215. (for mvSLOUCH model) Best wishes Krzysztof ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Hi Julien, You are right that power is a big (and generally still unexplored in the PCM context) issue. One way to address the question is after estimating the parameters of the BM/OU/mvSLOUCH model you can calculate the covariance/correlation function of the traits, say Cov(Y)(t)/Corr(Y)(t) where Y is the 2D vector trait+environment variable. The mvSLOUCH package returns these, I guess similar as stationary(model_ou) in mvMorph? The of course one can look at the magnitude of the trait+environment correlation coefficient. Regarding power difficult to say, model selection BM v OUBM might be difficult if the sample is small. But on the other hand in the mvSLOUCH paper Bartoszek et. al. (2012) we had a n=33 Cervidae set and a 2dim OU+1dim BM clearly outperformed 3dim BM (by AICc). If the trait is related to the environment then I would hazard that this should be picked up. Estimation of individual parameters in the SDE can be a tough thing but from my experience the function Cov(Y)(now) is decently estimated. Which makes sense - this is a complex combination of the SDE parameters and the ML tries to fit them to the contemporary sample which is described by E(Y)(now), Cov(Y)(now). In the 1dim OU,BM cases this was studied quite well by us (Serik Sagitov) and Cecile Ane et. al. and there seems to be quite a bit of power to get at these parameter combinations. Another possibility to look at is at the conditional expectation E(trait | environment)(now) - the evolutionary regression or its limit the optimal/limiting regression E(trait | environment)(infinity) . To get significance is not that easy, I guess the best way is a parametric bootstrap as you suggest below. But this depends on the pylogeny size, if it will not take overly long. I raised this point because depending on the model i.e. BM/OU/mvSLOUCH the design matrix of the regression should be different (Hansen et. al. 2008 discuss this at length). And the interpretation of the regression parameters might be different - e.g. you could have a phylogenetic lag in the estimated regression coefficient if the design matrix is just the measured variable. Best wishes Krzysztof > Hi Krzysztof, > > I agree with you that if you want to test explicitly adaptation in a > regression framework (i.e. environmental variables as predictors), > slouch/mvslouch is more sounding than doing a simple regression ((p)GLS) > or multiple regression. > I emphasize here that the coefficients of a regression model and the > correlations are really distincts. It's because all is about the nature > of the variation in the response variable in a regression model (and > it's why that changing the variables in a multiple regression assume > different nature of errors terms). > > The initial question was on how to obtain the correlations between the > traits, and possibly their significance, while CCA provides only the > correlation between the traits sets. > > The significance of the correlations computed through multivariate > approaches may be impeded by the number of parameters to estimate/sample > size (a power issue), but in multiple regression setting if there is > correlated independent variables the standard errors will be impacted > and thus the p-values (in a more misleading way). > It's also true that, as Krzystof pointed out, BM may not be the best > evolutionary model for describing the correlation in the residuals. > However the number of parameters to estimate will be even greater (for > OU multivariate model such as in mvSLOUCH), and we may lack power in the > significance tests of the correlations, no? > > > Julien > > > > Date: Sat, 17 Oct 2015 14:09:25 +0200 > > From: krzysztofbartos...@wp.pl > > To: r-sig-phylo@r-project.org > > CC: r-sig-phylo@r-project.org > > Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) > > > > Hi Sandra, Julien and Liam! > > One thought that I had is since you have an environmental variable and > one supposedly correlated with it is that then the slouch/mvSLOUCH (on > CRAN) models could be a possibility? In them you can have the > environmental variable(s) evolving as a Brownian motion (i.e. random > drift, no selecetion) and then the responding one(s) as an OU whose > primary optimum depends on the state of the environment. As far as I > followed from the discussion you had either both environment and > responding trait > > following a Brownian motion (meaning there is correlation between them > but not adaptation) or both following an OU process (meaning that the > environment is trying to adapt to something which might not be justified > biologically). > > > > Sandra if you think such a setup - a trait adapting to a primary optimum > def
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Hi Sandra, Both approaches are (p)GLS, but they represent different models: multivariate and multiple regression. In the first you have several dependent variables, in the second you have several independent variables. The error associated with the different responses (or dependent) variables may have different variances and may be correlated in the multivariate model. For sure you can dilute the power of the analysis when you try to make more estimates with the same amount of data (as it is the case in the multivariate model). The approach which consist in replacing the dependent and independent variable, is similar to what is called "reverse regression". But doing so you make particular assumptions because the nature of the stochastic error is different in each regressions. Maybe someone have more thought on this and if this can be misleading? Julien > Date: Wed, 14 Oct 2015 16:13:03 -0300 > Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) > From: gou...@mnhn.fr > To: julien.cla...@hotmail.fr > CC: liam.rev...@umb.edu; r-sig-phylo@r-project.org > > Oi Julien, > > Yes you are right, I was comparing things that are not the same! > However, even changing the p-values as you suggested, the conclusions > (which is really what i'm interested in) are still much different > between the two methods. For the example I gave above, the correlation > is still not significant (p-value = 0.16). > > Overall i get much less significant correlations with this method > compared to pGLS. Is there an explanation? Could it be a problem of > over-parametrisation? > > Best, > Sandra. > > > 2015-10-14 13:37 GMT-03:00 Julien Clavel > <julien.cla...@hotmail.fr<mailto:julien.cla...@hotmail.fr>>: > Sandra, > > I just figured out, are you directly comparing the coefficients > (slopes) of the multiple regression analysis with the evolutionary > correlations computed with mvBM? > > These are two distincts things although they are related (same sign)! > For instance, while correlations (and evolutionary correlations) range > between -1 and 1, the range of the slope can take higher values. > > Best, > > Julien > > >> From: julien.cla...@hotmail.fr<mailto:julien.cla...@hotmail.fr> >> To: gou...@mnhn.fr<mailto:gou...@mnhn.fr>; > liam.rev...@umb.edu<mailto:liam.rev...@umb.edu> >> Date: Wed, 14 Oct 2015 17:50:37 +0200 >> CC: r-sig-phylo@r-project.org<mailto:r-sig-phylo@r-project.org> >> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) >> >> Hi Sandra, >> >> Maybe I choose a bad example to illustrate the approach. In fact I > simulated positive correlations so the way the p-value was computed was > for illustrative purpose and is wrong for negative correlations (we > need a two-sided test). In fact you must look whether your 95% > confidence interval for the parameter include 0 or not. >> >> Just replace: >> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) >> >> by something like that (if I'm not wrong) >> >> pvalue <- function(x){ >> neg <- mean(x>0) >> pos <- mean(x<0) >> pval <- min(neg,pos)*2 >> return(pval) >> } >> >> p_values <- apply(correl_boot,2, pvalue) >> >> This should fix the problem. >> >> Julien >> >> >>> Date: Wed, 14 Oct 2015 11:35:43 -0300 >>> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) >>> From: gou...@mnhn.fr<mailto:gou...@mnhn.fr> >>> To: liam.rev...@umb.edu<mailto:liam.rev...@umb.edu> >>> CC: julien.cla...@hotmail.fr<mailto:julien.cla...@hotmail.fr>; > r-sig-phylo@r-project.org<mailto:r-sig-phylo@r-project.org> >>> >>> Dear Liam and Julien, >>> >>> Thank you both for your help. I have tried both solutions on the same >>> dataset under a BM model to compare them. I expected to get similar >>> results, at least for the general conclusions on the significance of >>> correlations. >>> >>> for Julien's solution, I ran: >>> >>> model<-mvBM(tree, data, model="BM1",method="pic") >>> cov2cor(model$sigma) >>> >>> # bootstrapping to get p-values >>> data_boot <- simulate(model, tree=tree, nsim=100) >>> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, >>> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) >>> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) >>&
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
ta, model="BM1",method="pic", >> param=list(constraint="diagonal")) >> >> # Test the significance with an LRT test >> LRT(model,model_2) >> >> >> >> >> # But if you want to assess the significance for each traits when there >> is more than two traits, use bootstrap instead: >> >> # Simulate data under the MLE and then fit the model to simulated traits >> data_boot <- simulate(model, tree=tree, nsim=100) >> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, >> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) >> >> # Retrieve the results >> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) >> colnames(correl_boot)<-c("[1,2]","[1,3]","[2,3]") >> >> # Plot the results >> boxplot(correl_boot, main="Correlations") >> >> # Assess if simulated data under the MLE are significantly different from >> zero >> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) >> p_values >> >> >> >> # You can also use the same approach described above if the best fit >> process is an Ornstein-Uhlenbeck (which is unlikely here); e.g.: >> >> model_ou <- mvOU(tree, data, model="OU1", method="rpf", >> param=list(vcv="ouch")) >> cov2cor(stationary(model_ou)) >> >> There is some related examples in the package vignette. >> HTH, >> >> >> Julien >> >> Date: Mon, 12 Oct 2015 14:34:26 -0300 >>> From: gou...@mnhn.fr >>> To: r-sig-phylo@r-project.org >>> Subject: [R-sig-phylo] Question on pCCA (phyl.cca) >>> >>> Dear list, >>> >>> I have 2 sets of continuous variables (acoustic and environmental) and a >>> phylogeny for a group of species. I am looking for correlations between >>> the >>> environmental variables and the acoustic ones. I ran a pCCA (phyl.cca >>> {phytools} ), which gave me p-values for the correlations of each of my >>> environmental variables with* all* acoustic variables. However, i have >>> specific hyptotheses I would like to test such as: is environemental >>> variable A correlated with acoustic variable Z? >>> >>> If I run univariate pGLS for each of my acoustic variables, I have the >>> problem that I am not taking into account the potential colinearity >>> between >>> my acoustic variables... So I can first run pPCA, but then i'm not really >>> testing my original hypotheses on specific variables. >>> >>> So my question is: is there a way to test the significance of correlation >>> among continuous variables from two sets (taking the phylogenetic >>> componant >>> into account)? >>> >>> Any input will be very much apreciated! >>> Best, >>> Sandra. >>> >>> [[alternative HTML version deleted]] >>> >>> ___ >>> R-sig-phylo mailing list - R-sig-phylo@r-project.org >>> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >>> Searchable archive at >>> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >>> >> >> [[alternative HTML version deleted]] >> >> ___ >> R-sig-phylo mailing list - R-sig-phylo@r-project.org >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo >> Searchable archive at >> http://www.mail-archive.com/r-sig-phylo@r-project.org/ >> >> -- Ph.D. Laboratório de História Natural de Anfíbios Brasileiros (LaHNAB <http://www.naturalhistory.com.br/>) Departamento de Biologia Animal (Bloco P) Instituto de Biologia (IB) Universidade Estadual de Campinas (UNICAMP) Rua Monteiro Lobato, 255, CEP 13083-862 Campinas, São Paulo Brazil tel: +55 (19) 99978-7173 Flickr: https://www.flickr.com/gp/133250906@N05/g4196q [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Hi Sandra, Maybe I choose a bad example to illustrate the approach. In fact I simulated positive correlations so the way the p-value was computed was for illustrative purpose and is wrong for negative correlations (we need a two-sided test). In fact you must look whether your 95% confidence interval for the parameter include 0 or not. Just replace: p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) by something like that (if I'm not wrong) pvalue <- function(x){ neg <- mean(x>0) pos <- mean(x<0) pval <- min(neg,pos)*2 return(pval) } p_values <- apply(correl_boot,2, pvalue) This should fix the problem. Julien > Date: Wed, 14 Oct 2015 11:35:43 -0300 > Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) > From: gou...@mnhn.fr > To: liam.rev...@umb.edu > CC: julien.cla...@hotmail.fr; r-sig-phylo@r-project.org > > Dear Liam and Julien, > > Thank you both for your help. I have tried both solutions on the same > dataset under a BM model to compare them. I expected to get similar > results, at least for the general conclusions on the significance of > correlations. > > for Julien's solution, I ran: > > model<-mvBM(tree, data, model="BM1",method="pic") > cov2cor(model$sigma) > > # bootstrapping to get p-values > data_boot <- simulate(model, tree=tree, nsim=100) > model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, > model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) > correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) > p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) > > # the correlations and associated p-values > correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] ) > res<-rbind(correl, p_values) > > > For Liam's solution, I ran: > > corbr<-corBrownian(phy=tree) > > glsp1<- > gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4, > correlation=corbr, data=data) > glsp2<- > gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4, > correlation=corbr, data=data) > glsp3<- > gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4, > correlation=corbr, data=data) > glsp4<- > gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3, > correlation=corbr, data=data) > > > I'm getting much more significant correlations with pGLS than with > mvBM, and the correlation coefficients are sometimes very different as > well. The results I get from the pGLS are generally more intuitive > biologically. For example, for the dominant frequency of calls: > > > DF~body.size with mvBM: -0.29 p-value=0.92 and with pGLS: -0.26 > p-value=0.016. Here the coefficients are close, but the significance is > very different. Biologically, we expect a significant correlation > between these two variables, so it is quite surprising to have such a > high p-value for the mvBM method. > > DF~env.var1 with mvBM:0.35 p-value=0.03 and with pGLS: 0.021 p-value= > 0.09. Here the coefficients are really quite different... > > Is it normal that I'm getting such different results? Did I miss something? > > Best, > Sandra. > > > > > > > > 2015-10-13 15:18 GMT-03:00 Liam J. Revell > <liam.rev...@umb.edu<mailto:liam.rev...@umb.edu>>: > Hi Sandra. > > If you are interested in the individual correlations, why not just do a > PGLS in which (on at a time) the variables of interest are set as the > independent variable. > > So, for instance, to test "is environemental variable A correlated with > acoustic variable Z" (controlling for all other environmental variables > & acoustic variables), one would do: > > acoustic variable Z ~ environmental variable A + all other > environmental variable + all other acoustic variables + phylogenetic > residual error > > If you have collinearity, then you may not be able distinguish between > the effects of different variables - but this will be true for all > statistical models. (Imagine the worst case in which variables A & B > were perfectly collinear, well then it becomes theoretically impossible > to determine if A is affecting Z or B, no matter what we do.) > > All the best, Liam > > Liam J. Revell, Associate Professor of Biology > University of Massachusetts Boston > web: http://faculty.umb.edu/liam.revell/ > email: liam.rev...@umb.edu<mailto:liam.rev...@umb.edu> > blog: http://blog.phytools.org > > > On 10/13/2015 5:47 AM, Julien Clavel wrote: > Hi Sandra, > > I think, a mult
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Sandra, I just figured out, are you directly comparing the coefficients (slopes) of the multiple regression analysis with the evolutionary correlations computed with mvBM? These are two distincts things although they are related (same sign)! For instance, while correlations (and evolutionary correlations) range between -1 and 1, the range of the slope can take higher values. Best, Julien > From: julien.cla...@hotmail.fr > To: gou...@mnhn.fr; liam.rev...@umb.edu > Date: Wed, 14 Oct 2015 17:50:37 +0200 > CC: r-sig-phylo@r-project.org > Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) > > Hi Sandra, > > Maybe I choose a bad example to illustrate the approach. In fact I simulated > positive correlations so the way the p-value was computed was for > illustrative purpose and is wrong for negative correlations (we need a > two-sided test). In fact you must look whether your 95% confidence interval > for the parameter include 0 or not. > > Just replace: > p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) > > by something like that (if I'm not wrong) > > pvalue <- function(x){ > neg <- mean(x>0) > pos <- mean(x<0) > pval <- min(neg,pos)*2 > return(pval) > } > > p_values <- apply(correl_boot,2, pvalue) > > This should fix the problem. > > Julien > > >> Date: Wed, 14 Oct 2015 11:35:43 -0300 >> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) >> From: gou...@mnhn.fr >> To: liam.rev...@umb.edu >> CC: julien.cla...@hotmail.fr; r-sig-phylo@r-project.org >> >> Dear Liam and Julien, >> >> Thank you both for your help. I have tried both solutions on the same >> dataset under a BM model to compare them. I expected to get similar >> results, at least for the general conclusions on the significance of >> correlations. >> >> for Julien's solution, I ran: >> >> model<-mvBM(tree, data, model="BM1",method="pic") >> cov2cor(model$sigma) >> >> # bootstrapping to get p-values >> data_boot <- simulate(model, tree=tree, nsim=100) >> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, >> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) >> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) >> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) >> >> # the correlations and associated p-values >> correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] ) >> res<-rbind(correl, p_values) >> >> >> For Liam's solution, I ran: >> >> corbr<-corBrownian(phy=tree) >> >> glsp1<- >> gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4, >> correlation=corbr, data=data) >> glsp2<- >> gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4, >> correlation=corbr, data=data) >> glsp3<- >> gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4, >> correlation=corbr, data=data) >> glsp4<- >> gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3, >> correlation=corbr, data=data) >> >> >> I'm getting much more significant correlations with pGLS than with >> mvBM, and the correlation coefficients are sometimes very different as >> well. The results I get from the pGLS are generally more intuitive >> biologically. For example, for the dominant frequency of calls: >> >> >> DF~body.size with mvBM: -0.29 p-value=0.92 and with pGLS: -0.26 >> p-value=0.016. Here the coefficients are close, but the significance is >> very different. Biologically, we expect a significant correlation >> between these two variables, so it is quite surprising to have such a >> high p-value for the mvBM method. >> >> DF~env.var1 with mvBM:0.35 p-value=0.03 and with pGLS: 0.021 p-value= >> 0.09. Here the coefficients are really quite different... >> >> Is it normal that I'm getting such different results? Did I miss something? >> >> Best, >> Sandra. >> >> >> >> >> >> >> >> 2015-10-13 15:18 GMT-03:00 Liam J. Revell >> <liam.rev...@umb.edu<mailto:liam.rev...@umb.edu>>: >> Hi Sandra. >> >> If you are interested in the individual correlations, why not just do a >> PGLS in which (on at a time) the variables of interest are set as the >> independent variable. >> >> So, for instance, to test "is environemental variable A correlated with >> acoustic variable Z" (c
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Oi Julien, Yes you are right, I was comparing things that are not the same! However, even changing the p-values as you suggested, the conclusions (which is really what i'm interested in) are still much different between the two methods. For the example I gave above, the correlation is still not significant (p-value = 0.16). Overall i get much less significant correlations with this method compared to pGLS. Is there an explanation? Could it be a problem of over-parametrisation? Best, Sandra. 2015-10-14 13:37 GMT-03:00 Julien Clavel <julien.cla...@hotmail.fr>: > Sandra, > > I just figured out, are you directly comparing the coefficients (slopes) > of the multiple regression analysis with the evolutionary correlations > computed with mvBM? > > These are two distincts things although they are related (same sign)! For > instance, while correlations (and evolutionary correlations) range between > -1 and 1, the range of the slope can take higher values. > > Best, > > Julien > > > > From: julien.cla...@hotmail.fr > > To: gou...@mnhn.fr; liam.rev...@umb.edu > > Date: Wed, 14 Oct 2015 17:50:37 +0200 > > CC: r-sig-phylo@r-project.org > > Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) > > > > Hi Sandra, > > > > Maybe I choose a bad example to illustrate the approach. In fact I > simulated positive correlations so the way the p-value was computed was for > illustrative purpose and is wrong for negative correlations (we need a > two-sided test). In fact you must look whether your 95% confidence interval > for the parameter include 0 or not. > > > > Just replace: > > p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) > > > > by something like that (if I'm not wrong) > > > > pvalue <- function(x){ > > neg <- mean(x>0) > > pos <- mean(x<0) > > pval <- min(neg,pos)*2 > > return(pval) > > } > > > > p_values <- apply(correl_boot,2, pvalue) > > > > This should fix the problem. > > > > Julien > > > > > >> Date: Wed, 14 Oct 2015 11:35:43 -0300 > >> Subject: Re: [R-sig-phylo] Question on pCCA (phyl.cca) > >> From: gou...@mnhn.fr > >> To: liam.rev...@umb.edu > >> CC: julien.cla...@hotmail.fr; r-sig-phylo@r-project.org > >> > >> Dear Liam and Julien, > >> > >> Thank you both for your help. I have tried both solutions on the same > >> dataset under a BM model to compare them. I expected to get similar > >> results, at least for the general conclusions on the significance of > >> correlations. > >> > >> for Julien's solution, I ran: > >> > >> model<-mvBM(tree, data, model="BM1",method="pic") > >> cov2cor(model$sigma) > >> > >> # bootstrapping to get p-values > >> data_boot <- simulate(model, tree=tree, nsim=100) > >> model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, > >> model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) > >> correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) > >> p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) > >> > >> # the correlations and associated p-values > >> correl <- t( cov2cor(model$sigma)[upper.tri(cov2cor(model$sigma))] ) > >> res<-rbind(correl, p_values) > >> > >> > >> For Liam's solution, I ran: > >> > >> corbr<-corBrownian(phy=tree) > >> > >> glsp1<- > >> gls(ac.var1~env.var1+env.var2+body.size+ac.var2+ac.var3+ac.var4, > >> correlation=corbr, data=data) > >> glsp2<- > >> gls(ac.var2~env.var1+env.var2+body.size+ac.var1+ac.var3+ac.var4, > >> correlation=corbr, data=data) > >> glsp3<- > >> gls(ac.var3~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var4, > >> correlation=corbr, data=data) > >> glsp4<- > >> gls(ac.var4~env.var1+env.var2+body.size+ac.var1+ac.var2+ac.var3, > >> correlation=corbr, data=data) > >> > >> > >> I'm getting much more significant correlations with pGLS than with > >> mvBM, and the correlation coefficients are sometimes very different as > >> well. The results I get from the pGLS are generally more intuitive > >> biologically. For example, for the dominant frequency of calls: > >> > >> > >> DF~body.size with mvBM: -0.29 p-value=0.92 and with pGLS: -0.26 > >> p-value=0.016. Here the coefficien
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Hi Sandra, I think, a multivariate correlational model with your continuous traits as response variables can be used to do what you want. For instance, assuming Brownian motion: library(mvMORPH) set.seed(123) ## First simulate some data (3 traits) # simulate a tree tree <- pbtree(n=100) # Define a correlation matrix C <- matrix(c(1,0.7,0.3,0.7,1,0.3,0.3,0.3,1),3) # Define the variance to build a covariance matrix D <- diag(c(sqrt(0.3),sqrt(0.2),sqrt(0.1)),3) # Compute the covariance matrix sigma <- D%*%C%*%D # Ancestral states theta <- c(0,0,0) # simulate the traits data<-mvSIM(tree, param=list(sigma=sigma, ntraits=3, mu=theta, names_traits=c("Trait 1","Trait 2", "Trait 3")), model="BM1", nsim=1) ## Now we fit the multivariate brownian model model<-mvBM(tree, data, model="BM1",method="pic") # Obtain the traits correlations (very close to the generating values): cov2cor(model$sigma) # You can compare the fit of a model with independent Brownian motion model_2<-mvBM(tree, data, model="BM1",method="pic", param=list(constraint="diagonal")) # Test the significance with an LRT test LRT(model,model_2) # But if you want to assess the significance for each traits when there is more than two traits, use bootstrap instead: # Simulate data under the MLE and then fit the model to simulated traits data_boot <- simulate(model, tree=tree, nsim=100) model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) # Retrieve the results correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) colnames(correl_boot)<-c("[1,2]","[1,3]","[2,3]") # Plot the results boxplot(correl_boot, main="Correlations") # Assess if simulated data under the MLE are significantly different from zero p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) p_values # You can also use the same approach described above if the best fit process is an Ornstein-Uhlenbeck (which is unlikely here); e.g.: model_ou <- mvOU(tree, data, model="OU1", method="rpf", param=list(vcv="ouch")) cov2cor(stationary(model_ou)) There is some related examples in the package vignette. HTH, Julien > Date: Mon, 12 Oct 2015 14:34:26 -0300 > From: gou...@mnhn.fr > To: r-sig-phylo@r-project.org > Subject: [R-sig-phylo] Question on pCCA (phyl.cca) > > Dear list, > > I have 2 sets of continuous variables (acoustic and environmental) and a > phylogeny for a group of species. I am looking for correlations between the > environmental variables and the acoustic ones. I ran a pCCA (phyl.cca > {phytools} ), which gave me p-values for the correlations of each of my > environmental variables with* all* acoustic variables. However, i have > specific hyptotheses I would like to test such as: is environemental > variable A correlated with acoustic variable Z? > > If I run univariate pGLS for each of my acoustic variables, I have the > problem that I am not taking into account the potential colinearity between > my acoustic variables... So I can first run pPCA, but then i'm not really > testing my original hypotheses on specific variables. > > So my question is: is there a way to test the significance of correlation > among continuous variables from two sets (taking the phylogenetic componant > into account)? > > Any input will be very much apreciated! > Best, > Sandra. > > [[alternative HTML version deleted]] > > ___ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Question on pCCA (phyl.cca)
Hi Sandra. If you are interested in the individual correlations, why not just do a PGLS in which (on at a time) the variables of interest are set as the independent variable. So, for instance, to test "is environemental variable A correlated with acoustic variable Z" (controlling for all other environmental variables & acoustic variables), one would do: acoustic variable Z ~ environmental variable A + all other environmental variable + all other acoustic variables + phylogenetic residual error If you have collinearity, then you may not be able distinguish between the effects of different variables - but this will be true for all statistical models. (Imagine the worst case in which variables A & B were perfectly collinear, well then it becomes theoretically impossible to determine if A is affecting Z or B, no matter what we do.) All the best, Liam Liam J. Revell, Associate Professor of Biology University of Massachusetts Boston web: http://faculty.umb.edu/liam.revell/ email: liam.rev...@umb.edu blog: http://blog.phytools.org On 10/13/2015 5:47 AM, Julien Clavel wrote: Hi Sandra, I think, a multivariate correlational model with your continuous traits as response variables can be used to do what you want. For instance, assuming Brownian motion: library(mvMORPH) set.seed(123) ## First simulate some data (3 traits) # simulate a tree tree <- pbtree(n=100) # Define a correlation matrix C <- matrix(c(1,0.7,0.3,0.7,1,0.3,0.3,0.3,1),3) # Define the variance to build a covariance matrix D <- diag(c(sqrt(0.3),sqrt(0.2),sqrt(0.1)),3) # Compute the covariance matrix sigma <- D%*%C%*%D # Ancestral states theta <- c(0,0,0) # simulate the traits data<-mvSIM(tree, param=list(sigma=sigma, ntraits=3, mu=theta, names_traits=c("Trait 1","Trait 2", "Trait 3")), model="BM1", nsim=1) ## Now we fit the multivariate brownian model model<-mvBM(tree, data, model="BM1",method="pic") # Obtain the traits correlations (very close to the generating values): cov2cor(model$sigma) # You can compare the fit of a model with independent Brownian motion model_2<-mvBM(tree, data, model="BM1",method="pic", param=list(constraint="diagonal")) # Test the significance with an LRT test LRT(model,model_2) # But if you want to assess the significance for each traits when there is more than two traits, use bootstrap instead: # Simulate data under the MLE and then fit the model to simulated traits data_boot <- simulate(model, tree=tree, nsim=100) model_boot <- lapply(data_boot, function(x){ cov2cor(mvBM(tree, x, model="BM1", method="pic", echo=FALSE, diagnostic=FALSE)$sigma) }) # Retrieve the results correl_boot <- t(sapply(model_boot, function(x){ x[upper.tri(x)] })) colnames(correl_boot)<-c("[1,2]","[1,3]","[2,3]") # Plot the results boxplot(correl_boot, main="Correlations") # Assess if simulated data under the MLE are significantly different from zero p_values <- apply(correl_boot,2,function(x){ mean(x < 0) }) p_values # You can also use the same approach described above if the best fit process is an Ornstein-Uhlenbeck (which is unlikely here); e.g.: model_ou <- mvOU(tree, data, model="OU1", method="rpf", param=list(vcv="ouch")) cov2cor(stationary(model_ou)) There is some related examples in the package vignette. HTH, Julien Date: Mon, 12 Oct 2015 14:34:26 -0300 From: gou...@mnhn.fr To: r-sig-phylo@r-project.org Subject: [R-sig-phylo] Question on pCCA (phyl.cca) Dear list, I have 2 sets of continuous variables (acoustic and environmental) and a phylogeny for a group of species. I am looking for correlations between the environmental variables and the acoustic ones. I ran a pCCA (phyl.cca {phytools} ), which gave me p-values for the correlations of each of my environmental variables with* all* acoustic variables. However, i have specific hyptotheses I would like to test such as: is environemental variable A correlated with acoustic variable Z? If I run univariate pGLS for each of my acoustic variables, I have the problem that I am not taking into account the potential colinearity between my acoustic variables... So I can first run pPCA, but then i'm not really testing my original hypotheses on specific variables. So my question is: is there a way to test the significance of correlation among continuous variables from two sets (taking the phylogenetic componant into account)? Any input will be very much apreciated! Best, Sandra. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archi
[R-sig-phylo] Question on pCCA (phyl.cca)
Dear list, I have 2 sets of continuous variables (acoustic and environmental) and a phylogeny for a group of species. I am looking for correlations between the environmental variables and the acoustic ones. I ran a pCCA (phyl.cca {phytools} ), which gave me p-values for the correlations of each of my environmental variables with* all* acoustic variables. However, i have specific hyptotheses I would like to test such as: is environemental variable A correlated with acoustic variable Z? If I run univariate pGLS for each of my acoustic variables, I have the problem that I am not taking into account the potential colinearity between my acoustic variables... So I can first run pPCA, but then i'm not really testing my original hypotheses on specific variables. So my question is: is there a way to test the significance of correlation among continuous variables from two sets (taking the phylogenetic componant into account)? Any input will be very much apreciated! Best, Sandra. [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/