Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data

Jarrod Hadfield Thu, 11 Feb 2016 00:44:50 -0800

Hi Grace,

Sorry for not responding earlier. The prior for the fixed effects looksOK - it can be useful to use such priors in the case of complete separation.

Without seeing the data it is hard to diagnose why the mixing is poor.My guess is that a phylogenetic heritability (signal) close to one hashigh support, which slows the mixing of the chain down. Is this thecase, what does


m1$VCV[, 1] / (rowSums(m1$VCV) + pi^2/3)

look like?

The parameter expanded prior although reasonably flat for heritabilitiesin the range 0.05-0.98 has spikes close to 0 and 1. If the data givesupport in these regions (either because the true value is close to oneof these extremes, or because there is not much information so thelikelihood surface is flat) then the posterior can get dragged up into them.


Two possible options are a) to switch to a chi-square prior:

G=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)

Pierre de Villemereuil demonstrates that this can have better propertieswith a probit link model (he uses family="ordinal")


http://onlinelibrary.wiley.com/store/10.1111/2041-210X.12011/asset/supinfo/mee312011-sup-0002-Appendix-B.pdf?v=1&s=223aa0df3256a23f694f67254cde03841149ae3e

With family="ordinal" the heritability is calculated as m1$VCV[, 1] /(rowSums(m1$VCV) + 1). However, I would switch to family="threshold"which is standard probit regression and the heritability is simplym1$VCV[, 1] / rowSums(m1$VCV). Mixing is generally better in the latter.

Option b) is to assume the heritability is one (rather than estimateit). This can be done using MCMCglmmRAM, and example code can be foundhere.


http://onlinelibrary.wiley.com/store/10.1111/2041-210X.12354/asset/supinfo/mee312354-sup-0003-AppendixS3.pdf?v=1&s=58d260cb8c9b14812b164806c5f0a9af68af2b23

Note that this is an alternative algorithm for Felsenstein's thresholdmodel (which is a probit glmm with h2 fixed at one) and so could befitted using other software.


Cheers,

Jarrod









On 10/02/2016 18:33, Jörg Albrecht wrote:

Dear Grace,

the problem may derive from your specification of the priors. Usually you don’t 
specify the prior for B in MCMCglmm. The problem may also be related to the 
size of your dataset. Estimation of effects can be difficult with binary data, 
when the dataset is small. Below is a small example from Jarrod Hadfield for 
binary regression that accounts for phylogeny.


tree <- rcoal(200)
x <- rnorm(200)
l <- rbv(tree, 1, nodes="TIPS") + x + rnorm(200)
y <- rbinom(200, 1, plogis(l))
dat <- data.frame(y = y, x = x, species = tree$tip.label)
prior1 <- list(R = list(V = 1, fix = 1),
                G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 
1000)))
# residual variance fixed at 1.
Ainv <- inverseA(tree)$Ainv
m1 <- MCMCglmm(y ~ x, random = ~ species,
                ginverse = list(species = Ainv),
                family = "categorical",
                prior = prior1, data = dat)
# fixed effects should be around zero and one (+/- monte carlo error)
summary(m1$Sol)
# phylogenetic ICC should be 1/(2 + pi^2/3) = 0.189 (+/- monte carlo error)
summary(m1$VCV[, 1] / (rowSums(m1$VCV) + pi^2/3))

Hope this helps,

Jörg

—
Jörg Albrecht, PhD
Postdoctoral researcher
Institute of Nature Conservation
Polish Academy of Sciences
Mickiewicza 33
31-120 Krakow, Poland
www.carpathianbear.pl <http://www.carpathianbear.pl/>
www.globeproject.pl <http://www.globeproject.pl/>
www.iop.krakow.pl <http://www.iop.krakow.pl/>

Am 10.02.2016 um 00:55 schrieb Grace Pold <grace.p...@gmail.com>:

Hello,

I have characterized a few hundred bacteria from two environments and want to know if the bacteria 
from one environment is more likely to show a trait than the bacteria isolated from the other 
environment. So my data is binary in both the independent and in the dependent variable: 
"environment 1?" yes/no, and "degrades carbon source?" yes/no. If I wasn’t 
accounting for phylogeny, I think I would use a Chi-Squared test. But I would like to account for 
phylogeny in my analysis, since some of the bacteria form clusters on my phylogenetic tree with 
members only from one environment.

I thought maybe I could use MCMCglmm to test this, and have been following the 
examples previously posted on r-sig-phylo and in the MCMCglmm course notes. 
However, my model either fails to converge even after millions of iterations, 
or the autocorrelation plot shows strong positive correlations at a range of 
lags. So I think either I cannot use MCMCglmm for this, or I am specifying my 
model wrong. Any pointers in the right direction would be greatly appreciated.

Here is my model:

prior.m2c.5 = list(B = list(mu = c(0, 0), V = diag(2) *(1 + pi^2/3)), R = 
list(V = 1, fix = 1), G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 
1000)))

simplemcmcCMCv2_5<-MCMCglmm(carbon01~environment01, random=~animal, pedigree=ctree, 
data=wcboth, family="categorical”, nitt=10000000, burnin=100000,thin=2000, 
prior=prior.m2c.5)

Thank you in advance for any help,

Grace Pold

Graduate Program in Organismic and Evolutionary Biology
University of Massachusetts, Amherst
        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data

Reply via email to