Re: [R-sig-phylo] Partitioning phylogentic and environment effect on trait distribution

Jarrod Hadfield Thu, 22 Jan 2015 05:52:26 -0800

Hi,

You might try looking at:

A general and simple method for obtaining R2 from generalized linearmixed-effects models by Nakagawa & Schielzeth.

They detail some of the issues with comparing null versus full models,and it is a well presented paper.

I can't remember whether they deal with the fact that the standardestimator for the R^2 is upwardly biased, as is the variance explainedby the fixed effects. It is easier to see this from a frequentistpoint of view: imagine a covariate x with an estimated regressioncoefficient bhat. bhat = b + d where b is the true coefficient and dis the sampling deviation.

The variance explained by x is VAR(x*b) = VAR(x)b^2 butE[VAR(x*bhat)] (where the expectation is taken over possibleestimates, bhat) isVAR(x)b^2+VAR(x)VAR(d). The estimated R^2 is upwardly biased byuncertainty in the fixed effects: VAR(x)VAR(d) is always positive. Itsnot that surprising: under least squares bhat is the value thatexplains most of the variance in the data: if b differs from bhat, bmust explain less.


Cheers,

Jarrod

The derivation for the above is:

 E[VAR(x*bhat)] = E[VAR(x*(b+d))]
                = E[VAR(x)(b+d)^2]
                = E[VAR(x)(b^2+2bd+d^2)]
                = VAR(x)(b^2+VAR(d))

d^2 = VAR(d) and E[2bd]=0 for unbiased estimators because E[d]=0.

Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Thu, 22 Jan 201513:17:58 +0100:

Hi
Va/(Va+Ve) is the proportion of variance explained by phylogeny*after* accounting for the fixed effects. To include the varianceexplained by the fixed effects in the denominator is a bitdifficult, because you have to assume something about thedistribution of predictors. For example do you want the differenthabitat types to be equally represented, or at the frequency withwhich they were sampled?
I see. I thought I could just look at the change in Va with orwithout the fixed effect (full versus null model). The biologicalquestion is if the observed trait variation is mainly a result ofphylogeny or the environment, hence the focus on variance explainedby the different components. I must admit that im not following whythe distribution of the predictor is important (or what kind ofapproach you are referring to), but I guess that "frequency withwhich they were sampled" is most appropriate for the question. Thisshould be easier for a continuous predictor, right?
Cheers

Gustaf

On 2015-01-22 12:23, Jarrod Hadfield wrote:
Hi Gustaf,
In the model with just species the residual variation ismeasurement error/plasticity error, but could also includedeviations from the assumed BM process. If you add species.ide thatterm captures deviations from the assumed BM process.
Va/(Va+Ve) is the proportion of variance explained by phylogeny*after* accounting for the fixed effects. To include the varianceexplained by the fixed effects in the denominator is a bitdifficult, because you have to assume something about thedistribution of predictors. For example do you want the differenthabitat types to be equally represented, or at the frequency withwhich they were sampled?
Cheers,

Jarrod
Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Thu, 22 Jan2015 11:11:02 +0100:
Jarrod,
Thanks for your input. Dropping species.ide result in a model thatis estimated without any problems.prior = list(R = list(V = 1, nu = 0), G = list(G1 = list(V = 1, nu= 1, alpha.mu = 0, alpha.V = 1000)))
m1.mcmc <- MCMCglmm(y ~ habitat, random =~ species,
data = traits, ginverse = list(species = Ainv),nitt = 700000, thin=100, prior = prior)It seems to be rather robust (not sensitive to priors) and Im notsure I understand why you would need 100s of species.
But how is this model related to the simpler approach I describedin my first post?The error term (Ve) is the within species error(measurement/plasticity error) and the species variance (Va) isthe phylogenetic variance in my data. Lambda (Pagel's) is thenVa/(Va+Ve).(m1.mcmc$VCV[,"species"] / (m1.mcmc$VCV[,"species"] +m1.mcmc$VCV[,"units"]))
# BUT is this only correct under a model without habitat?
If habitat is included in the model the between species variationnot explained by phylogeny or habitat is included in Ve because Idropped the speceis.ide, or am I missing something here?
If I run a model with and without habitat I can extract the amountof variation explained by habitat (differences in Ve = Vhab) andget a ballpark number on the relative contribution to the patternwe observe:
Va / (Va+Vhab+Ve) #phylo
Vhab / (Va+Vhab+Ve) #habitat
Ve / (Va+Vhab+Ve) #measurement/plasticity/local adaption .... andother processes
Did I get that right or am I lost?

Gustaf


On 2015-01-22 04:54, Jarrod Hadfield wrote:
Hi Gustaf,
1/ You can ignore nhabitat: for some reason I thought you thoughtthat there might be systematic differences between specialistsand non-specialists.
2/ species and species.ide are identical, but species effects arestructured by the phylogeny due to the ginverse specification.species.ide are treated as iid.
3/ With few species and few habitats, and very few speciesobserved in multiple habitats, I think you will just have to gofor the simple model you describe (perhaps even droppingspecies.ide):
m1.mcmc<-MCMCglmm(trait~as.factor(habitat)+x, random =~species+species.ide, data=my_data, ginverse=list(species=Ainv))
Expect the variance components to be poorly estimated and priorsensitive. 100's of species are required to get even moderatelyprecise estimates of the phylogenetic variance.
Cheers,

Jarrod
Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Thu, 22 Jan2015 01:15:23 +0100:
Jarrod,
I tried this but Im not sure it works well for my problem and/orIm doing something wrong.
First, Im not sure what nhabitat is. A vector specifying thenumber of habitats that the species is found in? I.e.my_data$nhabitat may look like:1,1,1,1,2,2,2,2, 2,2,2,2 .... if the first two species occurs inone and two habitats, respectively.
Second, species.ide. If it is the same as species, why not just:+ species + species ?
Should it be a vector identical to species but just names species.ide?
So to my data, there are only three habitats (3 levels), thus Imnot sure it works well modeling it as a random factor (and asyou indicate we need a "reasonable number of habitats"). Thereare 2, 2 and 11 species in each habitat, and only a few arefound in multiple habitats. So Im really looking at simplermodel because there are not enough data for complicatedstructures. With so few species things are pushed to the limitbut my guess is that the main problem here is that habitatcannot be fitted as a random factor. Possible to model it as afixed effect?
However, there are other predictors of interest as well,continuous variables. How would that be modeled then?
Like this (predictor=x)?
m1.mcmc<-MCMCglmm(trait~ x, random =~ species+species.ide,data=my_data, ginverse=list(species=Ainv))
I would also like to thank Tony for recommending excellentpapers and I didnt know about the pez package which lookspromising.
Cheers

Gustaf


On 2015-01-21 16:26, Jarrod Hadfield wrote:
Dear Gustaf,
How many levels of `habitat' are there, and are theycross-classified with respect to species (i.e. are multiplespecies measured in the same habitat)?
Assuming for now there are a reasonable number of habitats thenthe simplest model (without cross-classification) inasreml/MCMCglmm would be
Ainv<-inverseA(my_tree)$Ainv
m1.asreml<-asreml(trait~nhabitat, random =~habitat+giv(species)+species.ide, data=my_data,ginverse=list(species=sm2asreml(Ainv)))
m1.mcmc<-MCMCglmm(trait~nhabitat, random =~habitat+species+species.ide, data=my_data,ginverse=list(species=Ainv))
nhabitat is the number of habitats a species is found in.species.ide is the same as species and models the betweenspecies variation not captured by the BM model. Note that thisrequires you using the individual data rather than the speciesmeans (which is to be preferred).
With cross-classification you may want to have the effects ofthe same habitat vary across species. There's many ways to dothis, but for now lets assume i) that in general they could bepositively correlated (if the trait increases for species 1 inhabitat 1 then it is also likely to increase the trait inspecies 2) and ii) if species 1 and species 2 are closerelatives the habitat is likely to effect them in a moresimilar way:
m2.asreml<-asreml(trait~nhabitat, random =~habitat+giv(species)+giv(species):habitat+species.ide,data=my_data, ginverse=list(species=sm2asreml(Ainv)))
m2.mcmc<-MCMCglmm(trait~nhabitat, random =~habitat+species+species:habitat+species.ide, data=my_data,ginverse=list(species=Ainv))
There's many more possibilities of course, depending on thebiology of the problem.
Cheers,

Jarrod
Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Wed, 21Jan 2015 15:59:23 +0100:
Dear all,
Is it possible to quantify the influence of phylogeny andenvironment on a measured trait?
Say that trait y has been measured in several species locatedin different habitats. Some species have been measured in morethan one habitat (less specialized species). Now I want to saysomething about how much variation in my trait that may beexplained by phylogenetic structure (I have the phyl tree) andthe habitat. I also have multiple measurements per species inthe habitat(s) where is occurs (replicates) but lets ignorethat for now and only use the means.
One way would be to calculate phylogenetic corrected traitmeans (as in Bloomberg's K).Here it is easy to add rows/columns in the vcv.phylo(tree) forspecies that occurs in more than one habitat and it is assumedthat they have the same postion in the tree (ie within-speciescorrelation is = 1)Residual sums of squares for the phylo effect can becalculated and the corrected trait means are used in a linearmodel to test the habitat effect.
R2 for both factors (phylo, habitat) can then be calculated.
Another method is to use gls(). But I dont know how to dealwith species occurring in multiple habitats here (can you feedthe gls with a manually contructed var-covariance matrix?). IfI drop some species x habitat combinations (so species canonly occur in one habitat) I can do:
bm.sp <- corBrownian(phy = tree)
fit0 <- gls(trait ~ 1, sp.means, method = "ML")
fit1 <- gls(trait ~ 1, sp.means, correlation = bm.sp, method = "ML")
fit2 <- gls(trait ~ Habitat, sp.means, correlation = bm.sp,method = "ML")
anova(fit0, fit1, fit2) #check AIC

#pseudoR2 (not perfect, I know, but indicate somewhat the "importance")
G2 <- (-2*(logLik(fit0) - logLik(fit1)))[1]
n = nrow(sp.means)
r2ML.phyl <- 1 - exp(-G2/n) #phyl effect

G2 <- (-2*(logLik(fit0)-logLik(fit2)))[1]
r2ML.both <- 1 - exp(-G2/n)
r2ML.hab <- r2ML.both - r2ML.phyl #habitat effect
As mentioned, there are many assumptions here (BM etc) but Iwonder if this sort of approach is in general valid?
Is there a "standard way" to deal with these kind of problems(e.g. same species in multiple groups)?
Is there a way to incorporate replicates? (manually add/rowscolumns in the phylogenetic varcovar-matrix?)(this has been on the list before, MCMCglmm has been mentionedand maybe a more complete modeling approach is the way to gohere)
Key papers?

Cheers

Gustaf

--
Gustaf Granath (PhD)
Post doc
Swedish University of Agricultural Sciences
Department of Aquatic Sciences and Assessment

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive athttp://www.mail-archive.com/r-sig-phylo@r-project.org/




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Partitioning phylogentic and environment effect on trait distribution

Reply via email to