Hi Sam,
The terminology G and R structure is used widely, for example in
ASreml & SAS and probably others. The G-structure is the covariance
matrix of the random effects and the R-structure is the covariance
matrix of the residuals. In your model you have one random term
(animal) and one residual term (the default - units).
You haven't posted your model specification only the summary, so I
will presume that the animal term is linked to a phylogeny. There is a
single random effect for each of the n species and they have
correlation structure A. A is an nxn matrix representing the
proportion of evolution time shared between each species. The
covariance structure is Va*A where Va is a scalar variance estimated
from the data. The ... in the prior:
G=list(G1=list(...))
should contain your specification for the prior on Va.
units is a factor with a unique level for each row of the data.frame.
The default ~units therefore has the standard correlation structure I
for the residuals. I is an identity matrix (ones on the diagonal
zero's elsewhere) and represents the fact that we believe the
residuals are independent and homoscedastic. If there is only one
observation per species I is nxn also. The covariance structure is
Ve*I where Ve is a scalar variance estimated from the data. The ...
in the prior:
R=list(...)
should contain your specification for the prior on Ve.
Together we have the prior
prior=list(R=R,G=G)
You have this bit of code (which I've modified a bit):
l = rbv(tree, Va, nodes="TIPS") + rnorm(100, 0, sqrt(Ve))
which generates random data with covariance structure Va*A+Ve*I. Va=1
and Ve=1 in your example and Va/(Va+Ve)=0.5 is sometimes known as the
phylogenetic heritability (Lynch 1991) or more commonly Pagel's lambda
(Pagel 1999). Note that I've removed x from the code because it
doesn't really make sense to have it there. More likely you want
something like:
l = rbv(tree, Va, nodes="TIPS") + as.matrix(x)%*%beta + rnorm(100, 0,
sqrt(Ve))
where beta is a vector of regression coefficients (beta=c(1, 1))
determining the effect of being arboreal and being nocturnal
respectively on energy use. The data have the same covariance
structure as before conditional on the vector of means (x%*%beta).
However, the response in your analysis is not l but Energy. This was
generated with as:
50 + rnorm(100, 0, sqrt(Ve))
where Ve=20^2=400.
The summary shows that the posterior distribution returned by MCMCglmm
has support for the true values (Va=0, Ve=400, beta1=0, beta2=0). For
example Va is associated with the animal term in your model and so has
a posterior mean of 2.186 and 95% credible interval of 0 - 6.493.
If you rerun the analysis with l rather than Energy
data = data.frame(l, x, animal=tree$tip.label)
prior=list(R=list(V=1,nu=0), G=list(G1=list(V=1, nu=1, alpha.mu=0,
alpha.V=1000)))
# here I've used a flat *improper* prior for the Ve and a parameter
expanded prior for Va. The Va prior is a scaled F-distribution with 1
df for the numerator and denominator. Parameter expanded priors have
some nice properties, but the main advantage of parameter expansion is
algorithmic - it increases the efficiency of the MCMC algorithm when
the posterior for Va contains values close to zero.
m2<-MCMCglmm(l~arboreal+nocturnal, random=~animal, data=data,
pedigree=tree, prior=prior)
The summary shows that the posterior distribution returned by MCMCglmm
has support for the true values (Va=1, Ve=1, beta1=1, beta2=1).
summary(m2)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 285.753
G-structure: ~animal
post.mean l-95% CI u-95% CI eff.samp
animal 1.49 0.1696 3.445 442.9
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 0.8819 0.6187 1.208 1000
Location effects: l ~ arboreal + nocturnal
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) -1.0719 -2.3445 0.0800 139.1 0.082 .
arboreal 1.3163 0.9545 1.6891 1000.0 <0.001 ***
nocturnal 0.9639 0.5477 1.3512 1108.4 <0.001 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Have you consulted the CourseNotes? I think not.
Cheers,
Jarrod
Quoting Sam Passmore <passmore....@gmail.com> on Sat, 15 Dec 2012
11:23:16 +1300:
Dear R-Sig-Phylo subscribers,
I am currently trying to perform some analysis using the MCMCglmm package,
however I am having trouble understanding some of the aspects of the
package and wondered if I could get your help?
I have tried to develop an example of my problem that I hope will make my
problems clear, but they are primarily associated around the meanings of
G-structure and R-structure. If anyone could give a broad definition of
what these parts of the analysis do and how they affect the results, that
would be most useful.
I have a list of species in a data frame with traits for each taxon. Thisis
made up of Yes/No variables and a continuous response variable we think is
related to these Yes/No variables. I also have a species phylogeny and
speciesÂ’ locations.
For the purpose of the example, letÂ’s say the response is how much energy a
species uses each day .
We want to model energy use as a function of whether a species is arboreal
(yes/no) and nocturnal (yes/no) and we want to control for (and measure the
effect of) phylogeny (later I would like to control for spatial
autocorrelation too).
tree = rcoal(100);
x = data.frame(arboreal = sample(c(0, 1), 100, replace = TRUE), nocturnal =
sample(c(0, 1), 100, replace = TRUE))
Now this is a bit of an aside, but I thought I would clear it up
l = rbv(tree, 1, nodes="TIPS") + x + rnorm(100)
is a piece of code I have seen commonly in examples, but it seems to me
that this is just to add some variation to the phylogeny. Is this the case?
I have a continuous response, which we can say is energy use
Energy = rnorm(100, 50, 20)
data = data.frame(Energy, x, animal=tree$tip.label)
Now, this is where I have driven off the main road. The implementation of
the priors are throwing me off my stride, primarily due to a lack of
attention during my Bayesian classes.
R & G-structures are a mystery to me. My guess would be they are something
to do with priors on residuals and Random effects. But I am unsure why the
are not like the priors I am used to using, which have a mean and a
variance.
I have read the following link numerous times, but it is not shedding any
light on it for me:
http://stats.stackexchange.com/questions/32994/lme4-vs-mcmcglmm-what-are-r-structure-g-structure-in-a-glmm
I used the following model and default priors, which lead to the output
below
Could anyone explain to me what the G-structure and R-structure sections in
the summary output are saying?
Preferably with reference to my example.
summary(m1)
Iterations = 3001:12991
Thinning interval = 10
Sample size = 1000
DIC: 868.0991
G-structure: ~animal
post.mean l-95% CI u-95% CI eff.samp
animal 2.186 3.495e-08 6.493 42.85
R-structure: ~units
post.mean l-95% CI u-95% CI eff.samp
units 332 239.8 421.4 1000
Location effects: Energy ~ arboreal + nocturnal
post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept) 48.1297 42.1408 54.3912 1000 <0.001 ***
arboreal 3.4769 -4.2014 9.8992 1000 0.342
nocturnal -0.9693 -8.5014 5.6701 1000 0.806
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Thanks,
Sam
[[alternative HTML version deleted]]
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/