Hi,
You might try looking at:
A general and simple method for obtaining R2 from generalized linear
mixed-effects models by Nakagawa & Schielzeth.
They detail some of the issues with comparing null versus full models,
and it is a well presented paper.
I can't remember whether they deal with the fact that the standard
estimator for the R^2 is upwardly biased, as is the variance explained
by the fixed effects. It is easier to see this from a frequentist
point of view: imagine a covariate x with an estimated regression
coefficient bhat. bhat = b + d where b is the true coefficient and d
is the sampling deviation.
The variance explained by x is VAR(x*b) = VAR(x)b^2 but
E[VAR(x*bhat)] (where the expectation is taken over possible
estimates, bhat) is
VAR(x)b^2+VAR(x)VAR(d). The estimated R^2 is upwardly biased by
uncertainty in the fixed effects: VAR(x)VAR(d) is always positive. Its
not that surprising: under least squares bhat is the value that
explains most of the variance in the data: if b differs from bhat, b
must explain less.
Cheers,
Jarrod
The derivation for the above is:
E[VAR(x*bhat)] = E[VAR(x*(b+d))]
= E[VAR(x)(b+d)^2]
= E[VAR(x)(b^2+2bd+d^2)]
= VAR(x)(b^2+VAR(d))
d^2 = VAR(d) and E[2bd]=0 for unbiased estimators because E[d]=0.
Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Thu, 22 Jan 2015
13:17:58 +0100:
Hi
Va/(Va+Ve) is the proportion of variance explained by phylogeny
*after* accounting for the fixed effects. To include the variance
explained by the fixed effects in the denominator is a bit
difficult, because you have to assume something about the
distribution of predictors. For example do you want the different
habitat types to be equally represented, or at the frequency with
which they were sampled?
I see. I thought I could just look at the change in Va with or
without the fixed effect (full versus null model). The biological
question is if the observed trait variation is mainly a result of
phylogeny or the environment, hence the focus on variance explained
by the different components. I must admit that im not following why
the distribution of the predictor is important (or what kind of
approach you are referring to), but I guess that "frequency with
which they were sampled" is most appropriate for the question. This
should be easier for a continuous predictor, right?
Cheers
Gustaf
On 2015-01-22 12:23, Jarrod Hadfield wrote:
Hi Gustaf,
In the model with just species the residual variation is
measurement error/plasticity error, but could also include
deviations from the assumed BM process. If you add species.ide that
term captures deviations from the assumed BM process.
Va/(Va+Ve) is the proportion of variance explained by phylogeny
*after* accounting for the fixed effects. To include the variance
explained by the fixed effects in the denominator is a bit
difficult, because you have to assume something about the
distribution of predictors. For example do you want the different
habitat types to be equally represented, or at the frequency with
which they were sampled?
Cheers,
Jarrod
Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Thu, 22 Jan
2015 11:11:02 +0100:
Jarrod,
Thanks for your input. Dropping species.ide result in a model that
is estimated without any problems.
prior = list(R = list(V = 1, nu = 0), G = list(G1 = list(V = 1, nu
= 1, alpha.mu = 0, alpha.V = 1000)))
m1.mcmc <- MCMCglmm(y ~ habitat, random =~ species,
data = traits, ginverse = list(species = Ainv),
nitt = 700000, thin=100, prior = prior)
It seems to be rather robust (not sensitive to priors) and Im not
sure I understand why you would need 100s of species.
But how is this model related to the simpler approach I described
in my first post?
The error term (Ve) is the within species error
(measurement/plasticity error) and the species variance (Va) is
the phylogenetic variance in my data. Lambda (Pagel's) is then
Va/(Va+Ve).
(m1.mcmc$VCV[,"species"] / (m1.mcmc$VCV[,"species"] +
m1.mcmc$VCV[,"units"]))
# BUT is this only correct under a model without habitat?
If habitat is included in the model the between species variation
not explained by phylogeny or habitat is included in Ve because I
dropped the speceis.ide, or am I missing something here?
If I run a model with and without habitat I can extract the amount
of variation explained by habitat (differences in Ve = Vhab) and
get a ballpark number on the relative contribution to the pattern
we observe:
Va / (Va+Vhab+Ve) #phylo
Vhab / (Va+Vhab+Ve) #habitat
Ve / (Va+Vhab+Ve) #measurement/plasticity/local adaption .... and
other processes
Did I get that right or am I lost?
Gustaf
On 2015-01-22 04:54, Jarrod Hadfield wrote:
Hi Gustaf,
1/ You can ignore nhabitat: for some reason I thought you thought
that there might be systematic differences between specialists
and non-specialists.
2/ species and species.ide are identical, but species effects are
structured by the phylogeny due to the ginverse specification.
species.ide are treated as iid.
3/ With few species and few habitats, and very few species
observed in multiple habitats, I think you will just have to go
for the simple model you describe (perhaps even dropping
species.ide):
m1.mcmc<-MCMCglmm(trait~as.factor(habitat)+x, random =~
species+species.ide, data=my_data, ginverse=list(species=Ainv))
Expect the variance components to be poorly estimated and prior
sensitive. 100's of species are required to get even moderately
precise estimates of the phylogenetic variance.
Cheers,
Jarrod
Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Thu, 22 Jan
2015 01:15:23 +0100:
Jarrod,
I tried this but Im not sure it works well for my problem and/or
Im doing something wrong.
First, Im not sure what nhabitat is. A vector specifying the
number of habitats that the species is found in? I.e.
my_data$nhabitat may look like:
1,1,1,1,2,2,2,2, 2,2,2,2 .... if the first two species occurs in
one and two habitats, respectively.
Second, species.ide. If it is the same as species, why not just:
+ species + species ?
Should it be a vector identical to species but just names species.ide?
So to my data, there are only three habitats (3 levels), thus Im
not sure it works well modeling it as a random factor (and as
you indicate we need a "reasonable number of habitats"). There
are 2, 2 and 11 species in each habitat, and only a few are
found in multiple habitats. So Im really looking at simpler
model because there are not enough data for complicated
structures. With so few species things are pushed to the limit
but my guess is that the main problem here is that habitat
cannot be fitted as a random factor. Possible to model it as a
fixed effect?
However, there are other predictors of interest as well,
continuous variables. How would that be modeled then?
Like this (predictor=x)?
m1.mcmc<-MCMCglmm(trait~ x, random =~ species+species.ide,
data=my_data, ginverse=list(species=Ainv))
I would also like to thank Tony for recommending excellent
papers and I didnt know about the pez package which looks
promising.
Cheers
Gustaf
On 2015-01-21 16:26, Jarrod Hadfield wrote:
Dear Gustaf,
How many levels of `habitat' are there, and are they
cross-classified with respect to species (i.e. are multiple
species measured in the same habitat)?
Assuming for now there are a reasonable number of habitats then
the simplest model (without cross-classification) in
asreml/MCMCglmm would be
Ainv<-inverseA(my_tree)$Ainv
m1.asreml<-asreml(trait~nhabitat, random =~
habitat+giv(species)+species.ide, data=my_data,
ginverse=list(species=sm2asreml(Ainv)))
m1.mcmc<-MCMCglmm(trait~nhabitat, random =~
habitat+species+species.ide, data=my_data,
ginverse=list(species=Ainv))
nhabitat is the number of habitats a species is found in.
species.ide is the same as species and models the between
species variation not captured by the BM model. Note that this
requires you using the individual data rather than the species
means (which is to be preferred).
With cross-classification you may want to have the effects of
the same habitat vary across species. There's many ways to do
this, but for now lets assume i) that in general they could be
positively correlated (if the trait increases for species 1 in
habitat 1 then it is also likely to increase the trait in
species 2) and ii) if species 1 and species 2 are close
relatives the habitat is likely to effect them in a more
similar way:
m2.asreml<-asreml(trait~nhabitat, random =~
habitat+giv(species)+giv(species):habitat+species.ide,
data=my_data, ginverse=list(species=sm2asreml(Ainv)))
m2.mcmc<-MCMCglmm(trait~nhabitat, random =~
habitat+species+species:habitat+species.ide, data=my_data,
ginverse=list(species=Ainv))
There's many more possibilities of course, depending on the
biology of the problem.
Cheers,
Jarrod
Quoting Gustaf Granath <gustaf.gran...@gmail.com> on Wed, 21
Jan 2015 15:59:23 +0100:
Dear all,
Is it possible to quantify the influence of phylogeny and
environment on a measured trait?
Say that trait y has been measured in several species located
in different habitats. Some species have been measured in more
than one habitat (less specialized species). Now I want to say
something about how much variation in my trait that may be
explained by phylogenetic structure (I have the phyl tree) and
the habitat. I also have multiple measurements per species in
the habitat(s) where is occurs (replicates) but lets ignore
that for now and only use the means.
One way would be to calculate phylogenetic corrected trait
means (as in Bloomberg's K).
Here it is easy to add rows/columns in the vcv.phylo(tree) for
species that occurs in more than one habitat and it is assumed
that they have the same postion in the tree (ie within-species
correlation is = 1)
Residual sums of squares for the phylo effect can be
calculated and the corrected trait means are used in a linear
model to test the habitat effect.
R2 for both factors (phylo, habitat) can then be calculated.
Another method is to use gls(). But I dont know how to deal
with species occurring in multiple habitats here (can you feed
the gls with a manually contructed var-covariance matrix?). If
I drop some species x habitat combinations (so species can
only occur in one habitat) I can do:
bm.sp <- corBrownian(phy = tree)
fit0 <- gls(trait ~ 1, sp.means, method = "ML")
fit1 <- gls(trait ~ 1, sp.means, correlation = bm.sp, method = "ML")
fit2 <- gls(trait ~ Habitat, sp.means, correlation = bm.sp,
method = "ML")
anova(fit0, fit1, fit2) #check AIC
#pseudoR2 (not perfect, I know, but indicate somewhat the "importance")
G2 <- (-2*(logLik(fit0) - logLik(fit1)))[1]
n = nrow(sp.means)
r2ML.phyl <- 1 - exp(-G2/n) #phyl effect
G2 <- (-2*(logLik(fit0)-logLik(fit2)))[1]
r2ML.both <- 1 - exp(-G2/n)
r2ML.hab <- r2ML.both - r2ML.phyl #habitat effect
As mentioned, there are many assumptions here (BM etc) but I
wonder if this sort of approach is in general valid?
Is there a "standard way" to deal with these kind of problems
(e.g. same species in multiple groups)?
Is there a way to incorporate replicates? (manually add/rows
columns in the phylogenetic varcovar-matrix?)
(this has been on the list before, MCMCglmm has been mentioned
and maybe a more complete modeling approach is the way to go
here)
Key papers?
Cheers
Gustaf
--
Gustaf Granath (PhD)
Post doc
Swedish University of Agricultural Sciences
Department of Aquatic Sciences and Assessment
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/