Re: [R-sig-phylo] PGLS with different correlation structures, model selection, REML vs ML
Thanks Emmanuel and Jörg! On Tue, Jan 20, 2015 at 8:40 AM, Emmanuel Paradis emmanuel.para...@ird.fr wrote: It seems I forgot to Cc my reply to the list... Emmanuel Message transféré Sujet : Re: [R-sig-phylo] PGLS with different correlation structures, model selection, REML vs ML Date : Fri, 09 Jan 2015 12:09:13 +0100 De : Emmanuel Paradis emmanuel.para...@ird.fr Pour : Brian A. Gill gillbri...@gmail.com Hi Brian, Le 08/01/2015 19:57, Brian A. Gill a écrit : Dear List-Serv Users. I am running PGLS across large blocks of trees with the corPagel and corMartins correlation structures with maximum likelihood estimation of parameters. In the literature, it seems like people using PGLS use either a Brownian motion or OU approach, and you don't see many examples of people trying both and then using model selection to determine which is better. Note: corMartins() implements an OU-based correlation strucrture. I am planning on running both and then summing Akakie weights across all trees for each correlation structure to determine which is better supported. Is this kind of model selection approach reasonable? Yes. AIC was originally developed with the idea to find the correct (auto-)correlation structure in time-series data. So it makes sense to use it to select an appropriate phylogenetic correlation structure. Should I use the AIC value determined using REML or ML? If you use the same regression model (ie, the fixed effects part of the model) then AIC_REML should be OK and, I suspect, will give you similar results than AIC_ML. Only the latter should be used if models with different regressions are compared (of course, providing the same response variable is used in all models). If I also wanted to present model averaged (weighted by AIC) estimates of slope/intercept for each correlation structure do I use the estimates from REML or ML? Regression coefficients (ie, the parameters of the fixed-effect) are the same with ML or REML (at least, this is what is expected). Best, Emmanuel Thanks and Happy New Year! Brian ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r- sig-ph...@r-project.org/ -- Brian A. Gill VISIT MY WEBSITE: http://gillbriana.wix.com/brian-gill FOLLOW ME ON TWITTER: @CSUBrianGill Colorado State University Biology 1878 Campus Delivery Fort Collins, CO 80523 970-215-7037 [[alternative HTML version deleted]] ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
[R-sig-phylo] Partitioning phylogentic and environment effect on trait distribution
Dear all, Is it possible to quantify the influence of phylogeny and environment on a measured trait? Say that trait y has been measured in several species located in different habitats. Some species have been measured in more than one habitat (less specialized species). Now I want to say something about how much variation in my trait that may be explained by phylogenetic structure (I have the phyl tree) and the habitat. I also have multiple measurements per species in the habitat(s) where is occurs (replicates) but lets ignore that for now and only use the means. One way would be to calculate phylogenetic corrected trait means (as in Bloomberg's K). Here it is easy to add rows/columns in the vcv.phylo(tree) for species that occurs in more than one habitat and it is assumed that they have the same postion in the tree (ie within-species correlation is = 1) Residual sums of squares for the phylo effect can be calculated and the corrected trait means are used in a linear model to test the habitat effect. R2 for both factors (phylo, habitat) can then be calculated. Another method is to use gls(). But I dont know how to deal with species occurring in multiple habitats here (can you feed the gls with a manually contructed var-covariance matrix?). If I drop some species x habitat combinations (so species can only occur in one habitat) I can do: bm.sp - corBrownian(phy = tree) fit0 - gls(trait ~ 1, sp.means, method = ML) fit1 - gls(trait ~ 1, sp.means, correlation = bm.sp, method = ML) fit2 - gls(trait ~ Habitat, sp.means, correlation = bm.sp, method = ML) anova(fit0, fit1, fit2) #check AIC #pseudoR2 (not perfect, I know, but indicate somewhat the importance) G2 - (-2*(logLik(fit0) - logLik(fit1)))[1] n = nrow(sp.means) r2ML.phyl - 1 - exp(-G2/n) #phyl effect G2 - (-2*(logLik(fit0)-logLik(fit2)))[1] r2ML.both - 1 - exp(-G2/n) r2ML.hab - r2ML.both - r2ML.phyl #habitat effect As mentioned, there are many assumptions here (BM etc) but I wonder if this sort of approach is in general valid? Is there a standard way to deal with these kind of problems (e.g. same species in multiple groups)? Is there a way to incorporate replicates? (manually add/rows columns in the phylogenetic varcovar-matrix?) (this has been on the list before, MCMCglmm has been mentioned and maybe a more complete modeling approach is the way to go here) Key papers? Cheers Gustaf -- Gustaf Granath (PhD) Post doc Swedish University of Agricultural Sciences Department of Aquatic Sciences and Assessment ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/
Re: [R-sig-phylo] Partitioning phylogentic and environment effect on trait distribution
Dear Gustaf, How many levels of `habitat' are there, and are they cross-classified with respect to species (i.e. are multiple species measured in the same habitat)? Assuming for now there are a reasonable number of habitats then the simplest model (without cross-classification) in asreml/MCMCglmm would be Ainv-inverseA(my_tree)$Ainv m1.asreml-asreml(trait~nhabitat, random =~ habitat+giv(species)+species.ide, data=my_data, ginverse=list(species=sm2asreml(Ainv))) m1.mcmc-MCMCglmm(trait~nhabitat, random =~ habitat+species+species.ide, data=my_data, ginverse=list(species=Ainv)) nhabitat is the number of habitats a species is found in. species.ide is the same as species and models the between species variation not captured by the BM model. Note that this requires you using the individual data rather than the species means (which is to be preferred). With cross-classification you may want to have the effects of the same habitat vary across species. There's many ways to do this, but for now lets assume i) that in general they could be positively correlated (if the trait increases for species 1 in habitat 1 then it is also likely to increase the trait in species 2) and ii) if species 1 and species 2 are close relatives the habitat is likely to effect them in a more similar way: m2.asreml-asreml(trait~nhabitat, random =~ habitat+giv(species)+giv(species):habitat+species.ide, data=my_data, ginverse=list(species=sm2asreml(Ainv))) m2.mcmc-MCMCglmm(trait~nhabitat, random =~ habitat+species+species:habitat+species.ide, data=my_data, ginverse=list(species=Ainv)) There's many more possibilities of course, depending on the biology of the problem. Cheers, Jarrod Quoting Gustaf Granath gustaf.gran...@gmail.com on Wed, 21 Jan 2015 15:59:23 +0100: Dear all, Is it possible to quantify the influence of phylogeny and environment on a measured trait? Say that trait y has been measured in several species located in different habitats. Some species have been measured in more than one habitat (less specialized species). Now I want to say something about how much variation in my trait that may be explained by phylogenetic structure (I have the phyl tree) and the habitat. I also have multiple measurements per species in the habitat(s) where is occurs (replicates) but lets ignore that for now and only use the means. One way would be to calculate phylogenetic corrected trait means (as in Bloomberg's K). Here it is easy to add rows/columns in the vcv.phylo(tree) for species that occurs in more than one habitat and it is assumed that they have the same postion in the tree (ie within-species correlation is = 1) Residual sums of squares for the phylo effect can be calculated and the corrected trait means are used in a linear model to test the habitat effect. R2 for both factors (phylo, habitat) can then be calculated. Another method is to use gls(). But I dont know how to deal with species occurring in multiple habitats here (can you feed the gls with a manually contructed var-covariance matrix?). If I drop some species x habitat combinations (so species can only occur in one habitat) I can do: bm.sp - corBrownian(phy = tree) fit0 - gls(trait ~ 1, sp.means, method = ML) fit1 - gls(trait ~ 1, sp.means, correlation = bm.sp, method = ML) fit2 - gls(trait ~ Habitat, sp.means, correlation = bm.sp, method = ML) anova(fit0, fit1, fit2) #check AIC #pseudoR2 (not perfect, I know, but indicate somewhat the importance) G2 - (-2*(logLik(fit0) - logLik(fit1)))[1] n = nrow(sp.means) r2ML.phyl - 1 - exp(-G2/n) #phyl effect G2 - (-2*(logLik(fit0)-logLik(fit2)))[1] r2ML.both - 1 - exp(-G2/n) r2ML.hab - r2ML.both - r2ML.phyl #habitat effect As mentioned, there are many assumptions here (BM etc) but I wonder if this sort of approach is in general valid? Is there a standard way to deal with these kind of problems (e.g. same species in multiple groups)? Is there a way to incorporate replicates? (manually add/rows columns in the phylogenetic varcovar-matrix?) (this has been on the list before, MCMCglmm has been mentioned and maybe a more complete modeling approach is the way to go here) Key papers? Cheers Gustaf -- Gustaf Granath (PhD) Post doc Swedish University of Agricultural Sciences Department of Aquatic Sciences and Assessment ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/ -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/