Re: [R-sig-phylo] MCMCglmm: G-structure & R-Structure

Jarrod Hadfield Mon, 17 Dec 2012 01:27:00 -0800

Hi Sam,

The terminology G and R structure is used widely, for example inASreml & SAS and probably others. The G-structure is the covariancematrix of the random effects and the R-structure is the covariancematrix of the residuals. In your model you have one random term(animal) and one residual term (the default - units).

You haven't posted your model specification only the summary, so Iwill presume that the animal term is linked to a phylogeny. There is asingle random effect for each of the n species and they havecorrelation structure A. A is an nxn matrix representing theproportion of evolution time shared between each species. Thecovariance structure is Va*A where Va is a scalar variance estimatedfrom the data. The ... in the prior:


G=list(G1=list(...))

should contain your specification for the prior on Va.

units is a factor with a unique level for each row of the data.frame.The default ~units therefore has the standard correlation structure Ifor the residuals. I is an identity matrix (ones on the diagonalzero's elsewhere) and represents the fact that we believe theresiduals are independent and homoscedastic. If there is only oneobservation per species I is nxn also. The covariance structure isVe*I where Ve is a scalar variance estimated from the data. The ...in the prior:


R=list(...)

should contain your specification for the prior on Ve.

Together we have the prior

prior=list(R=R,G=G)

You have this bit of code (which I've modified a bit):

l = rbv(tree, Va, nodes="TIPS") + rnorm(100, 0, sqrt(Ve))

which generates random data with covariance structure Va*A+Ve*I. Va=1and Ve=1 in your example and Va/(Va+Ve)=0.5 is sometimes known as thephylogenetic heritability (Lynch 1991) or more commonly Pagel's lambda(Pagel 1999). Note that I've removed x from the code because itdoesn't really make sense to have it there. More likely you wantsomething like:

l = rbv(tree, Va, nodes="TIPS") + as.matrix(x)%*%beta + rnorm(100, 0,sqrt(Ve))

where beta is a vector of regression coefficients (beta=c(1, 1))determining the effect of being arboreal and being nocturnalrespectively on energy use. The data have the same covariancestructure as before conditional on the vector of means (x%*%beta).

However, the response in your analysis is not l but Energy. This wasgenerated with as:


 50 + rnorm(100, 0, sqrt(Ve))

where Ve=20^2=400.

The summary shows that the posterior distribution returned by MCMCglmmhas support for the true values (Va=0, Ve=400, beta1=0, beta2=0). Forexample Va is associated with the animal term in your model and so hasa posterior mean of 2.186 and 95% credible interval of 0 - 6.493.


If you rerun the analysis with l rather than Energy

data = data.frame(l, x, animal=tree$tip.label)

prior=list(R=list(V=1,nu=0), G=list(G1=list(V=1, nu=1, alpha.mu=0,alpha.V=1000)))

# here I've used a flat *improper* prior for the Ve and a parameterexpanded prior for Va. The Va prior is a scaled F-distribution with 1df for the numerator and denominator. Parameter expanded priors havesome nice properties, but the main advantage of parameter expansion isalgorithmic - it increases the efficiency of the MCMC algorithm whenthe posterior for Va contains values close to zero.

m2<-MCMCglmm(l~arboreal+nocturnal, random=~animal, data=data,pedigree=tree, prior=prior)

The summary shows that the posterior distribution returned by MCMCglmmhas support for the true values (Va=1, Ve=1, beta1=1, beta2=1).

summary(m2)


 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000

 DIC: 285.753

 G-structure:  ~animal

       post.mean l-95% CI u-95% CI eff.samp
animal      1.49   0.1696    3.445    442.9

 R-structure:  ~units

      post.mean l-95% CI u-95% CI eff.samp
units    0.8819   0.6187    1.208     1000

 Location effects: l ~ arboreal + nocturnal

            post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)   -1.0719  -2.3445   0.0800    139.1  0.082 .
arboreal       1.3163   0.9545   1.6891   1000.0 <0.001 ***
nocturnal      0.9639   0.5477   1.3512   1108.4 <0.001 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Have you consulted the CourseNotes? I think not.

Cheers,

Jarrod

Quoting Sam Passmore <passmore....@gmail.com> on Sat, 15 Dec 201211:23:16 +1300:

Dear R-Sig-Phylo subscribers,



I am currently trying to perform some analysis using the MCMCglmm package,
however I am having trouble understanding some of the aspects of the
package and wondered if I could get your help?



I have tried to develop an example of my problem that I hope will make my
problems clear, but they are primarily associated around the meanings of
G-structure and R-structure. If anyone could give a broad definition of
what these parts of the analysis do and how they affect the results, that
would be most useful.



I have a list of species in a data frame with traits for each taxon. Thisis
made up of Yes/No variables and a continuous response variable we think is
related to these Yes/No variables. I also have a species phylogeny and
species locations.

For the purpose of the example, lets say the response is how much energy a
species uses each day .



We want to model energy use as a function of whether a species is arboreal
(yes/no) and nocturnal (yes/no) and we want to control for (and measure the
effect of) phylogeny (later I would like to control for spatial
autocorrelation too).



tree = rcoal(100);

x = data.frame(arboreal = sample(c(0, 1), 100, replace = TRUE), nocturnal =
sample(c(0, 1), 100, replace = TRUE))



Now this is a bit of an aside, but I thought I would clear it up



l = rbv(tree, 1, nodes="TIPS") + x + rnorm(100)

is a piece of code I have seen commonly in examples, but it seems to me
that this is just to add some variation to the phylogeny. Is this the case?



I have a continuous response, which we can say is energy use

Energy = rnorm(100, 50, 20)

data = data.frame(Energy, x, animal=tree$tip.label)



Now, this is where I have driven off the main road. The implementation of
the priors are throwing me off my stride, primarily due to a lack of
attention during my Bayesian classes.

R & G-structures are a mystery to me. My guess would be they are something
to do with priors on residuals and Random effects. But I am unsure why the
are not like the priors I am used to using, which have a mean and a
variance.



I have read the following link numerous times, but it is not shedding any
light on it for me:



http://stats.stackexchange.com/questions/32994/lme4-vs-mcmcglmm-what-are-r-structure-g-structure-in-a-glmm





I used the following model and default priors, which lead to the output
below



Could anyone explain to me what the G-structure and R-structure sections in
the summary output are saying?

Preferably with reference to my example.



summary(m1)



 Iterations = 3001:12991

 Thinning interval  = 10

 Sample size  = 1000



 DIC: 868.0991



 G-structure:  ~animal



       post.mean  l-95% CI u-95% CI eff.samp

animal     2.186 3.495e-08    6.493    42.85



 R-structure:  ~units



      post.mean l-95% CI u-95% CI eff.samp

units       332    239.8    421.4     1000



 Location effects: Energy ~ arboreal + nocturnal



            post.mean l-95% CI u-95% CI eff.samp  pMCMC

(Intercept)   48.1297  42.1408  54.3912     1000 <0.001 ***

arboreal         3.4769  -4.2014   9.8992     1000  0.342

nocturnal       -0.9693  -8.5014   5.6701     1000  0.806

---

Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Thanks,

Sam

        [[alternative HTML version deleted]]




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] MCMCglmm: G-structure & R-Structure

Reply via email to