Re: [R-sig-phylo] phylogenetic correction and MCMC model

2021-05-20 Thread Jarrod Hadfield

Hi,

This error always turns out to be mispellings on the part of the user. 
Perhaps you have trailing white space in them?


Cheers,

Jarrod



On 19/05/2021 09:09, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Hi Jarrod
there were actually a few misspelled names, but R
still tells me 'some levels of animal do not have a row entry in ginverse'
Might there be another reason for that?
cheers
d


David Costantini
Professor of Conservation Physiology
Muséum National d'Histoire Naturelle, CNRS
7 rue Cuvier
75005 Paris, France
Tel.: +33.(0)1.40.79.53.74
Associate Editor Functional Ecology
http://davidcostantini.wordpress.com/
https://twitter.com/DavidZool
http://scholar.google.com/citations?user=nBSC4-EJ=it

- Original Message -
From: "Jarrod Hadfield" 
To: "David COSTANTINI" 
Cc: "r-sig-phylo" 
Sent: Monday, 17 May, 2021 21:34:34
Subject: Re: [R-sig-phylo] phylogenetic correction and MCMC model

Hi,

chronos adds a chronos class to the object, creating problems. Anything
wrong with using:

phylo<-read.nexus("output.nex")
phylo<- di2multi(phylo)
inv.phylo<-inverseA(phylo)

?

Cheers,

Jarrod


On 17/05/2021 19:47, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Hi Jarrod
here are the scripts:
phylo<-read.nexus("output.nex")
phylo$edge.length <- phylo$edge.length[1 + .001]
phylo_ultra=chronos(phylo, lambda=0)
is.ultrametric(phylo_ultra)
tree.unmatched <- multi2di(phylo_ultra, random=TRUE)
plotTree(tree.unmatched,fsize=0.6)
inv.phylo<-inverseA(tree.unmatched,nodes="TIPS",scale=TRUE)

When I do plotTree, I can visualise the tree.
but the original file looks like a list, here is just an extract of the 
original file output.nex:

#NEXUS BEGIN TREES; TREE tree_2467 = 
((Anguilla_anguilla:274.35321211,(((Oreochromis_tanganicae:128.,((Channa_argus:94.28562300,((Scophthalmus_maximus:64.4000,Paralichthys_dentatus:64.4000)'14':19.14618300,Seriola_quinqueradiata:83.54618300)'13':10.73944000)'11':15.71437700,(((Gasterosteus_aculeatus:94.9000,Perca_flavescens:94.9000)'10':15.1000,((Siganus_vulpinus:99.8000,(Lutjanus_peru:99.8000,Argyrosomus_regius:99.8000)'19':0.)'9':10.2000,Sparus_aurata:110.)'22':0.)'8':0.,(Zosterisessor_ophiocephalus:56.2000,Ponticola_eurycephalus:56.2000)'6':53.8000)'30':0.)'29':18.)'27':78.33545792,((Salmo_salar:12.51368429,Salmo_trutta:12.51368429)'35':33.20842500,Oncorhynchus_mykiss:45.72210929)'43':160.61334863).;END;


thanks!
d

PS. If I drop TIPS, I have the same message.


David Costantini
Professor of Conservation Physiology
Muséum National d'Histoire Naturelle, CNRS
7 rue Cuvier
75005 Paris, France
Tel.: +33.(0)1.40.79.53.74
Associate Editor Functional Ecology
http://davidcostantini.wordpress.com/
https://twitter.com/DavidZool
http://scholar.google.com/citations?user=nBSC4-EJ=it

- Original Message -
From: "Jarrod Hadfield" 
To: "David COSTANTINI" , "r-sig-phylo" 

Sent: Monday, 17 May, 2021 19:48:59
Subject: Re: [R-sig-phylo] phylogenetic correction and MCMC model

Hi David,

It looks like phylo_ultra might be a list? Is phylo_ultra[[1]] a tree?

Also, don't use nodes="TIPS"; this is just to demonstrate how poor the
algorithm is when you don't use the expanded inverse. I see people using
nodes="TIPS" a lot - where does this code come from?

Cheers,

Jarrod

On 17/05/2021 16:06, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Dear All
I am trying to apply a phylogenetic correction to an MCMC model, but I have 
problems in making the inverse matrix. I can visualise the treeplot very well, 
but when I use the script:
inv.phylo<-inverseA(phylo_ultra,nodes="TIPS",scale=TRUE)

R tells me that there is an error:

Error in pedigree[, 2] : incorrect number of dimensions
In addition: Warning message:
In if (attr(pedigree, "class") == "phylo") { :

Do you have any experience with this? I couldn't find a solution so far.
Thanks in advance
David


   [[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylogenetic correction and MCMC model

2021-05-17 Thread Jarrod Hadfield

Hi,

Yes - all species in the data frame need to be in the phylogeny (but not 
vice versa).


Cheers,

Jarrod

On 17/05/2021 20:47, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Hi Jarrod
the new script works well, no problems with it.
So I moved on trying the mcmc...
R says 'some levels of animal do not have a row entry in ginverse'
I suppose that the issue is that in my dataset I have species missing
in the ginverse, right?
Thanks again for your help
d


David Costantini
Professor of Conservation Physiology
Muséum National d'Histoire Naturelle, CNRS
7 rue Cuvier
75005 Paris, France
Tel.: +33.(0)1.40.79.53.74
Associate Editor Functional Ecology
http://davidcostantini.wordpress.com/
https://twitter.com/DavidZool
http://scholar.google.com/citations?user=nBSC4-EJ=it

- Original Message -
From: "Jarrod Hadfield" 
To: "David COSTANTINI" 
Cc: "r-sig-phylo" 
Sent: Monday, 17 May, 2021 21:34:34
Subject: Re: [R-sig-phylo] phylogenetic correction and MCMC model

Hi,

chronos adds a chronos class to the object, creating problems. Anything
wrong with using:

phylo<-read.nexus("output.nex")
phylo<- di2multi(phylo)
inv.phylo<-inverseA(phylo)

?

Cheers,

Jarrod


On 17/05/2021 19:47, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Hi Jarrod
here are the scripts:
phylo<-read.nexus("output.nex")
phylo$edge.length <- phylo$edge.length[1 + .001]
phylo_ultra=chronos(phylo, lambda=0)
is.ultrametric(phylo_ultra)
tree.unmatched <- multi2di(phylo_ultra, random=TRUE)
plotTree(tree.unmatched,fsize=0.6)
inv.phylo<-inverseA(tree.unmatched,nodes="TIPS",scale=TRUE)

When I do plotTree, I can visualise the tree.
but the original file looks like a list, here is just an extract of the 
original file output.nex:

#NEXUS BEGIN TREES; TREE tree_2467 = 
((Anguilla_anguilla:274.35321211,(((Oreochromis_tanganicae:128.,((Channa_argus:94.28562300,((Scophthalmus_maximus:64.4000,Paralichthys_dentatus:64.4000)'14':19.14618300,Seriola_quinqueradiata:83.54618300)'13':10.73944000)'11':15.71437700,(((Gasterosteus_aculeatus:94.9000,Perca_flavescens:94.9000)'10':15.1000,((Siganus_vulpinus:99.8000,(Lutjanus_peru:99.8000,Argyrosomus_regius:99.8000)'19':0.)'9':10.2000,Sparus_aurata:110.)'22':0.)'8':0.,(Zosterisessor_ophiocephalus:56.2000,Ponticola_eurycephalus:56.2000)'6':53.8000)'30':0.)'29':18.)'27':78.33545792,((Salmo_salar:12.51368429,Salmo_trutta:12.51368429)'35':33.20842500,Oncorhynchus_mykiss:45.72210929)'43':160.61334863).;END;


thanks!
d

PS. If I drop TIPS, I have the same message.


David Costantini
Professor of Conservation Physiology
Muséum National d'Histoire Naturelle, CNRS
7 rue Cuvier
75005 Paris, France
Tel.: +33.(0)1.40.79.53.74
Associate Editor Functional Ecology
http://davidcostantini.wordpress.com/
https://twitter.com/DavidZool
http://scholar.google.com/citations?user=nBSC4-EJ=it

- Original Message -
From: "Jarrod Hadfield" 
To: "David COSTANTINI" , "r-sig-phylo" 

Sent: Monday, 17 May, 2021 19:48:59
Subject: Re: [R-sig-phylo] phylogenetic correction and MCMC model

Hi David,

It looks like phylo_ultra might be a list? Is phylo_ultra[[1]] a tree?

Also, don't use nodes="TIPS"; this is just to demonstrate how poor the
algorithm is when you don't use the expanded inverse. I see people using
nodes="TIPS" a lot - where does this code come from?

Cheers,

Jarrod

On 17/05/2021 16:06, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Dear All
I am trying to apply a phylogenetic correction to an MCMC model, but I have 
problems in making the inverse matrix. I can visualise the treeplot very well, 
but when I use the script:
inv.phylo<-inverseA(phylo_ultra,nodes="TIPS",scale=TRUE)

R tells me that there is an error:

Error in pedigree[, 2] : incorrect number of dimensions
In addition: Warning message:
In if (attr(pedigree, "class") == "phylo") { :

Do you have any experience with this? I couldn't find a solution so far.
Thanks in advance
David


   [[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable ar

Re: [R-sig-phylo] phylogenetic correction and MCMC model

2021-05-17 Thread Jarrod Hadfield

Hi,

chronos adds a chronos class to the object, creating problems. Anything 
wrong with using:


phylo<-read.nexus("output.nex")
phylo<- di2multi(phylo)
inv.phylo<-inverseA(phylo)

?

Cheers,

Jarrod


On 17/05/2021 19:47, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Hi Jarrod
here are the scripts:
phylo<-read.nexus("output.nex")
phylo$edge.length <- phylo$edge.length[1 + .001]
phylo_ultra=chronos(phylo, lambda=0)
is.ultrametric(phylo_ultra)
tree.unmatched <- multi2di(phylo_ultra, random=TRUE)
plotTree(tree.unmatched,fsize=0.6)
inv.phylo<-inverseA(tree.unmatched,nodes="TIPS",scale=TRUE)

When I do plotTree, I can visualise the tree.
but the original file looks like a list, here is just an extract of the 
original file output.nex:

#NEXUS BEGIN TREES; TREE tree_2467 = 
((Anguilla_anguilla:274.35321211,(((Oreochromis_tanganicae:128.,((Channa_argus:94.28562300,((Scophthalmus_maximus:64.4000,Paralichthys_dentatus:64.4000)'14':19.14618300,Seriola_quinqueradiata:83.54618300)'13':10.73944000)'11':15.71437700,(((Gasterosteus_aculeatus:94.9000,Perca_flavescens:94.9000)'10':15.1000,((Siganus_vulpinus:99.8000,(Lutjanus_peru:99.8000,Argyrosomus_regius:99.8000)'19':0.)'9':10.2000,Sparus_aurata:110.)'22':0.)'8':0.,(Zosterisessor_ophiocephalus:56.2000,Ponticola_eurycephalus:56.2000)'6':53.8000)'30':0.)'29':18.)'27':78.33545792,((Salmo_salar:12.51368429,Salmo_trutta:12.51368429)'35':33.20842500,Oncorhynchus_mykiss:45.72210929)'43':160.61334863).;END;


thanks!
d

PS. If I drop TIPS, I have the same message.


David Costantini
Professor of Conservation Physiology
Muséum National d'Histoire Naturelle, CNRS
7 rue Cuvier
75005 Paris, France
Tel.: +33.(0)1.40.79.53.74
Associate Editor Functional Ecology
http://davidcostantini.wordpress.com/
https://twitter.com/DavidZool
http://scholar.google.com/citations?user=nBSC4-EAAAAJ=it

- Original Message -
From: "Jarrod Hadfield" 
To: "David COSTANTINI" , "r-sig-phylo" 

Sent: Monday, 17 May, 2021 19:48:59
Subject: Re: [R-sig-phylo] phylogenetic correction and MCMC model

Hi David,

It looks like phylo_ultra might be a list? Is phylo_ultra[[1]] a tree?

Also, don't use nodes="TIPS"; this is just to demonstrate how poor the
algorithm is when you don't use the expanded inverse. I see people using
nodes="TIPS" a lot - where does this code come from?

Cheers,

Jarrod

On 17/05/2021 16:06, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Dear All
I am trying to apply a phylogenetic correction to an MCMC model, but I have 
problems in making the inverse matrix. I can visualise the treeplot very well, 
but when I use the script:
inv.phylo<-inverseA(phylo_ultra,nodes="TIPS",scale=TRUE)

R tells me that there is an error:

Error in pedigree[, 2] : incorrect number of dimensions
In addition: Warning message:
In if (attr(pedigree, "class") == "phylo") { :

Do you have any experience with this? I couldn't find a solution so far.
Thanks in advance
David


  [[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh 
Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] phylogenetic correction and MCMC model

2021-05-17 Thread Jarrod Hadfield

Hi David,

It looks like phylo_ultra might be a list? Is phylo_ultra[[1]] a tree?

Also, don't use nodes="TIPS"; this is just to demonstrate how poor the
algorithm is when you don't use the expanded inverse. I see people using
nodes="TIPS" a lot - where does this code come from?

Cheers,

Jarrod

On 17/05/2021 16:06, David Costantini wrote:

This email was sent to you by someone outside the University.
You should only click on links or attachments if you are certain that the email 
is genuine and the content is safe.

Dear All
I am trying to apply a phylogenetic correction to an MCMC model, but I have 
problems in making the inverse matrix. I can visualise the treeplot very well, 
but when I use the script:
inv.phylo<-inverseA(phylo_ultra,nodes="TIPS",scale=TRUE)

R tells me that there is an error:

Error in pedigree[, 2] : incorrect number of dimensions
In addition: Warning message:
In if (attr(pedigree, "class") == "phylo") { :

Do you have any experience with this? I couldn't find a solution so far.
Thanks in advance
David


 [[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh 
Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Comparing DIC of phylogenetic and non-phylogenetic GLMM run with MCMC (MCMCglmm)

2018-06-21 Thread Jarrod Hadfield

Hi Liam,

In multi-level models DIC can be 'focused' at different levels. In 
MCMCglmm, DIC is focussed at the highest possible level because this is 
the only level at which it can be analytically computed for non-Gaussian 
models. The highest level is not the level at which most scientists want 
their information criteria focussed, and so I would not recommend it. In 
fact I have wondered about removing it completely from MCMCglmm. 
Cross-validation is a much better approach, and in some ways is what 
information criteria aspire to. But its more computationally demanding 
of course.


Cheers,

Jarrod





On 21/06/2018 14:24, jonnations wrote:

Hi Liam,

I don't have the exact answer you are looking for, but I would highly
recommend the brms package in R. It is incredibly flexible and has
excellent diagnostic tools like LOO and WAIC that are easy to use and
interpret for model selection. I think it would work well for the models
you presented. There is an easy to follow tutorial on phylogenetic mixed
models too.

Also there is another list serve called "r-sig-mixed-models" that you might
be interested in. It's not "phylo" focused, but these sorts of questions
come up on there all the time.

Good luck!
Jon


ps- my first time responding to the list, sorry for any format errors




--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] selecting a set of incongruent trees from a posterior distribution

2017-07-26 Thread Jarrod Hadfield

Hi Jesse,

In order to account for phylogenetic uncertainty you are better just 
pulling trees from their posterior rather than choosing trees that are 
incongruent. The latter will give estimates biased toward those 
associated with extreme trees.


If the analysis is the binomial model you posted on R-sig-mixed you can 
speed up the analysis by passing the latent variables from one model as 
the starting values in the next model without a long burn-in. If the 
data are Gaussian you just need to pass the (co)varinaces as starting 
values.


Code is below. It is untested, and clearly you need to replace ... with 
the exact specifications of the model.


Note that this type of approach is still biased towards a lambda/h2 of 
0. To fix that you would need to simultaneously estimate the phylogeny 
and the parameters of the comparative analysis. The bias is probably 
small if the tree is well estimated, and doing it 'properly' would be 
difficult -  perhaps BEAST can do it.


Cheers,

Jarrod

ntree<-100  # number of trees

nsample<-100 # number of samples per tree

thin<-10 # thinning interval

model_all<-MCMCglmm(, pl=TRUE)  # initial model object to which 
results will be written


model_i<-model_all

model_all$Sol<-matrix(NA, nsample*ntree, ncol(model_all$Sol))

model_all$VCV<-matrix(NA, nsample*ntree, ncol(model_all$VCV))

# create empty matrices for writing results from each tree

for(i in 1:ntree){

# iterate over trees

model_i<-MCMCglmm(, pl=TRUE, 
start=list(liab=model_i$Liab[nsample,]), pedigree=tree[[i]], 
burnin=thin, thin=thin, nitt=thin*(nsample+1))


# fit model to tree i using starting values from previous model

model_all$Sol[(i-1)*nsample+1:nsample]<-model_i$Sol

model_all$VCV[(i-1)*nsample+1:nsample]<-model_i$VCV

}

model_all$Sol<-mcmc(model_all$Sol, thin=thin)

model_all$VCV<-mcmc(model_all$VCV, thin=thin)


On 26/07/2017 16:06, Santiago Sánchez wrote:

Hi Jesse,

As Eduardo says, if in fact you want to see how different trees are from a
"consensus", something that you could try is find the
maximum-clade-credibility (MCC) tree (you can do this with treeAnnotator
from BEAST). This will be a fully bifurcating tree and is essentially the
tree in the posterior distribution that maximizes  the overall clade
posterior probabilities (e.g. sum(posterior)). Once you have the MCC tree,
you can use the Robinson-Foulds distance metric (there is a function in
phytools) to compare all the topologies in your posterior distribution. The
trees with the lowest values will be the most incongruent with respect to
the MCC tree. Just remember to exclude the burnin.

Cheers,
Santiago

On Wed, Jul 26, 2017 at 6:01 AM  wrote:


Send R-sig-phylo mailing list submissions to
 r-sig-phylo@r-project.org

To subscribe or unsubscribe via the World Wide Web, visit
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
or, via email, send a message with subject or body 'help' to
 r-sig-phylo-requ...@r-project.org

You can reach the person managing the list at
 r-sig-phylo-ow...@r-project.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of R-sig-phylo digest..."


Today's Topics:

1. selecting a set of incongruent trees from a posterior
   distribution (Jesse Delia)
2. Re: selecting a set of incongruent trees from a posterior
   distribution (Eduardo Ascarrunz)


--

Message: 1
Date: Tue, 25 Jul 2017 16:16:30 -0400
From: Jesse Delia 
To: R-sig-phylo@r-project.org
Subject: [R-sig-phylo] selecting a set of incongruent trees from a
 posterior   distribution
Message-ID:
 <
ca+lom6ehmv6c2v_lso8b2sdqqvoqwfcrl56-4jhcmicvjyv...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Dear list,

Does anyone know of a function or script that could select a set of the
most incongruent trees from a list of trees? Maybe I missed a post, but
haven't found anything.

I running mixed models, which take a lot of computational space on my lap
top. In effort to account for phylogenetic uncertainty, without having to
run 100s of chains, I figured maybe i could run models across a (much)
shorter list that accounts for more diversity in tree shape observed within
the posterior distribution. Not sure if this makes sense, and/or is
extremely complicated?

Thanks for you time!

Jesse

 [[alternative HTML version deleted]]



--

Message: 2
Date: Wed, 26 Jul 2017 11:33:12 +0200
From: Eduardo Ascarrunz 
To: Jesse Delia 
Cc: R Sig Phylo Listserv 
Subject: Re: [R-sig-phylo] selecting a set of incongruent trees from a
 posterior distribution
Message-ID:
 <
cakxhxpxj_8ucosjt-bzpzvsn+5quet9vxnd4ssd4bqaq-fp...@mail.gmail.com>
Content-Type: text/plain; charset="UTF-8"

Hi Jesse,

Do you want to select 

Re: [R-sig-phylo] How to incorporate intraspecific variation in MCMCglmm

2017-07-17 Thread Jarrod Hadfield

cc-ed back to the list

Some observations seem to have no sampling variation and so have a 
residual variance of zero - which is not allowed, and doesn't make much 
sense.


Cheers,

Jarrod
On 16/07/2017 00:16, Diogo B. Provete wrote:

Hi Jarrod,
I have tried all the suggestions for specifying the random effects. 
However, your last suggestion of making the residual covariance seems 
more appropriated in my case, since I have the standard error 
calculated already (in terms of jumped distance) and it's interpreted 
as intraspecific variance due to sampling error. But when I make my 
model as:


model1<-MCMCglmm(mean_distance~type_arena*type_of_stimulus,
*random=~Species, rcov=~idh(units):units*, data=df_spe, 
family="gaussian",  ginverse = list(Especie=treeAinv), nodes="ALL", 
prior=ve_priors, nitt=30, burnin=25000, thin = 100, verbose=FALSE)


with priors:

ve_priors = list(R = list(V = diag(df_spe$se_distance^2), fix=1),
 G=list(G1=list(V=1, nu=0.02)))

I get the following error message:

Error in priorformat(if (NOpriorG) { :
  V is not positive definite for some prior$G/prior$R elements .

I tried to google it and looked at your course notes ch. 3 and 4, but 
couldn't find anything helpful to understand it.


I'm attaching the first few lines of the prior V matrix.

Thanks once again,
Diogo

Em sex, 14 de jul de 2017 às 18:47, Diogo B. Provete 
<dbprov...@gmail.com <mailto:dbprov...@gmail.com>> escreveu:


Dear Jarrod,
Thanks very much for the quick reply.

I'll try to implement the changes in the model.

Have a nice weekend,
Diogo


    Em Sex, 14 de jul de 2017 17:48, Jarrod Hadfield
<j.hadfi...@ed.ac.uk <mailto:j.hadfi...@ed.ac.uk>> escreveu:

Hi Diogo,

First, your model1 is unlikely to be valid unless the residual
variance
happens to be 1. You should not fix it at one, and use a prior
like:

prior = list(R = list(V = 1, nu = 0.02), G=list(G1=list(V=1,
nu=0.02)))

Note that the residual variance (Ve) is the intra-specific
variance,
assumed constant over species, as in Lynch's h2/Pagels Lambda.
If you
have the standard error (se) of each observation you can also
allow
heterogeneity between observations as in random-effect
meta-analysis:

random=~Specie+idh(se)

If you fix the variance associated with se at 1 in the prior

prior = list(R = list(V = 1, nu = 0.02), G=list(G1=list(V=1,
nu=0.02), G2=list(V=1, fix=1)))

Then the intraspecific variance for species i is se^2_i+Ve

If you do not fix the variance associated with se at 1, for
example with

prior = list(R = list(V = 1, nu = 0.02), G=list(G1=list(V=1,
nu=0.02), G2=list(V=1, nu=0.02)))

and estimate the associated variance (Vse). Then the
intraspecific variance
  for species i is Vse*se^2_i+Ve. This is useful if you only
know the standard error up to proportionality.

Alternatively you can fit fixed-effect meta-analysis where all
the intraspecific variance is presumed to be due to sampling
error:

random=~Specie, rcov=~idh(units):units

with prior:

prior = list(R = list(V = diag(se^2), fix=1),
G=list(G1=list(V=1, nu=0.02)))

The intraspecific variance for species i is se^2_i. This is
quite an inefficient way of doing fixed-effect meta-analysis,
and one day I will make such analyses easier to fit, and
quicker to fit..

Also, don't use nodes="TIPS" in the call to inverseA. I only
allowed this option because its the parameterisation used by
most other software, but its a really bad way of doing it.
Stick with the default, node="ALL".

Cheers,

Jarrod









On 14/07/2017 14:26, Diogo B. Provete wrote:
> treeAinv<-inverseA(phylo,nodes="TIPS",scale=TRUE)$Ainv
>
> I included the following priors:
>
> prior = list(R = list(V = 1, fix = 1), G=list(G1=list(V=1,
nu=0.02)))


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

-- 
*Diogo B. Provete, PhD.*

FAPESP Post-doctoral fellow
Department of Environmental Sciences
Centre for Sciences and Technologies for Sustainability
Federal University of São Carlos
Sorocaba

Gothenburg Global Biodiversity Centre
Box 462   SE-405 30
Göteborg, Sverige

diogoprovete.weebly.com <http://diogoprovete.weebly.com>

Cell phone: +5515981022137 <tel:%2815%29%2098102-2137>
Skype: diogoprovete

Editor: Amphibia-Reptilia | Biodiversity Data Journal

--
*Diogo B. Provete, PhD.*
FAPESP Post-doctoral fellow
Department 

Re: [R-sig-phylo] How to incorporate intraspecific variation in MCMCglmm

2017-07-14 Thread Jarrod Hadfield

Hi Diogo,

First, your model1 is unlikely to be valid unless the residual variance 
happens to be 1. You should not fix it at one, and use a prior like:


prior = list(R = list(V = 1, nu = 0.02), G=list(G1=list(V=1, nu=0.02)))

Note that the residual variance (Ve) is the intra-specific variance, 
assumed constant over species, as in Lynch's h2/Pagels Lambda. If you 
have the standard error (se) of each observation you can also allow 
heterogeneity between observations as in random-effect meta-analysis:


random=~Specie+idh(se)

If you fix the variance associated with se at 1 in the prior

prior = list(R = list(V = 1, nu = 0.02), G=list(G1=list(V=1, nu=0.02), 
G2=list(V=1, fix=1)))

Then the intraspecific variance for species i is se^2_i+Ve

If you do not fix the variance associated with se at 1, for example with

prior = list(R = list(V = 1, nu = 0.02), G=list(G1=list(V=1, nu=0.02), 
G2=list(V=1, nu=0.02)))

and estimate the associated variance (Vse). Then the intraspecific variance
 for species i is Vse*se^2_i+Ve. This is useful if you only know the standard 
error up to proportionality.

Alternatively you can fit fixed-effect meta-analysis where all the 
intraspecific variance is presumed to be due to sampling error:

random=~Specie, rcov=~idh(units):units

with prior:

prior = list(R = list(V = diag(se^2), fix=1), G=list(G1=list(V=1, nu=0.02)))

The intraspecific variance for species i is se^2_i. This is quite an 
inefficient way of doing fixed-effect meta-analysis, and one day I will make 
such analyses easier to fit, and quicker to fit..

Also, don't use nodes="TIPS" in the call to inverseA. I only allowed this option because 
its the parameterisation used by most other software, but its a really bad way of doing it. Stick 
with the default, node="ALL".

Cheers,

Jarrod

 








On 14/07/2017 14:26, Diogo B. Provete wrote:

treeAinv<-inverseA(phylo,nodes="TIPS",scale=TRUE)$Ainv

I included the following priors:

prior = list(R = list(V = 1, fix = 1), G=list(G1=list(V=1, nu=0.02)))



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] Threshold models using threshBayes vs MCMCglmmRAM

2016-12-15 Thread Jarrod Hadfield

Hi Chris,

OK - stick with the RAM model, the h2 is so high you will run into 
numerical issues otherwise. In the two-trait model you might want to add 
in us(at.level(trait,1)):units into the random effects (make sure it is 
not the last term in the random formula) in case log.dep has a h2 
substantially less than 1. Having a multi-level response will help with 
power so I would go for it. threshBayes does handle ordinal responses 
but you would probably have to run it for a VERY long time to sample the 
posterior adequately.


Cheers,

Jarrod

On 16/12/2016 07:11, Chris Mull wrote:

Hi Jarrod,
I hadn't appreciated that the clustering of reproductive modes on the 
tree might limit out ability to detect some of these relationships. 
This is in fact a step in testing reproduction as an ordinal variable 
(egg-laying, lecithotrophic live-bearing, and matrotrophic 
live-bearing) which follows gradients of depth and latitude, and 
threshBayes cannot handle ordinal variables. Perhaps treating the data 
this way will help given more transitions. I have run the model in 
MCMCglmm and have appended the summary and attached the histogram of 
the liabilities. Thanks so much for your help with this...


summary(dep2)
Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000

 DIC: 31.2585

 G-structure:  ~animal

   post.mean l-95% CI u-95% CI eff.samp
animal 82.1835.88140.16.266

 R-structure:  ~units

  post.mean l-95% CI u-95% CI eff.samp
units 1110

 Location effects: parity ~ log.med.depth

  post.mean l-95% CI u-95% CI eff.samp pMCMC
(Intercept)  0.4250 -13.5697  13.791328.54 0.946
log.med.depth   -0.3601  -4.4399   3.802216.48 0.862

On Thu, Dec 15, 2016 at 11:10 PM, Jarrod Hadfield <j.hadfi...@ed.ac.uk 
<mailto:j.hadfi...@ed.ac.uk>> wrote:


Hi Chris,

I think MCMCglmm is probably giving you the right answer. There
are huge chunks of the phylogeny that are either egg-laying and
live-bearing. The non-phylogenetic model shows a strong
relationship between reproductive mode and depth, and that might
be causal or it might just be because certain taxa tend to live at
greater depths and *happen to have* the same reproductive mode.
There's not much information in the phylogenetic spread of
reproductive modes to distinguish between these hypotheses, hence
the wide confidence intervals.   Just to be sure can you

a) just perform independent contrasts (not really suitable for
binary data, but probably won't give you an answer far off the
truth and is a nice simple sanity check).

b) using MCMCglmm (not MCMCglmmRAM) fit

prior.dep2<-=list(R=list(V=diag(1), fix=1),
G=list(G1=list(V=diag(1), nu=0.002)))

dep2<-MCMCglmm(parity~log.med.depth,
   random=~animal,
   rcov=~units,
   pedigree=shark.tree,
   data=traits,
   prior=prior.dep2,
   pr=TRUE,
   pl=TRUE,
   family="threshold")

an send me the summary and hist(dep2$Liab)

Cheers,

Jarrod



    On 16/12/2016 07:02, Jarrod Hadfield wrote:


Hi Chris,

I think ngen in threshbayes is not the number of full iterations
(i.e. a full update of all parameters), but the number of full
iterations multiplied by the number of nodes (2n-1). With n=600
species this means threshbayes has only really done about 8,000
iterations (i.e. about 1000X less than MCMCglmm). Simulations
suggest threshbayes is about half as efficient per full iteration
as MCMCglmm which means that it may have only collected
0.5*1132/1200 = 0.47 effective samples from the posterior. The
very extreme value and the surprisingly tight credible intervals
(+/-0.007) also suggest a problem.

**However**, the low effective sample size for the covariance in
the MCMglmm model is also worrying given the length of chain, and
may point to potential problems. Are egg-laying/live-bearing
scattered throughout the tree, or do they tend to aggregate a
lot? Can you send me the output from:

prior.dep<-=list(R=list(V=diag(1)*1e-15, fix=1),
G=list(G1=list(V=diag(1), fix=1)))

dep<-MCMCglmm(parity~log.med.depth,
   random=~animal,
   rcov=~units,
   pedigree=shark.tree,
   reduced=TRUE,
   data=traits,
   prior=prior2,
   pr=TRUE,
   pl=TRUE,
   family="threshold")

summary(dep)

summary(glm(parity~log.med.depth, data=traits,
family=binomial(link=probit)))

Cheers,

Jarrod



On 15/12/2016 20:59, Chris Mull

Re: [R-sig-phylo] Threshold models using threshBayes vs MCMCglmmRAM

2016-12-15 Thread Jarrod Hadfield

Hi Chris,

I think MCMCglmm is probably giving you the right answer. There are huge 
chunks of the phylogeny that are either egg-laying and live-bearing. The 
non-phylogenetic model shows a strong relationship between reproductive 
mode and depth, and that might be causal or it might just be because 
certain taxa tend to live at greater depths and *happen to have* the 
same reproductive mode. There's not much information in the phylogenetic 
spread of reproductive modes to distinguish between these hypotheses, 
hence the wide confidence intervals.   Just to be sure can you


a) just perform independent contrasts (not really suitable for binary 
data, but probably won't give you an answer far off the truth and is a 
nice simple sanity check).


b) using MCMCglmm (not MCMCglmmRAM) fit

prior.dep2<-=list(R=list(V=diag(1), fix=1), G=list(G1=list(V=diag(1), 
nu=0.002)))


dep2<-MCMCglmm(parity~log.med.depth,
   random=~animal,
   rcov=~units,
   pedigree=shark.tree,
   data=traits,
   prior=prior.dep2,
   pr=TRUE,
   pl=TRUE,
   family="threshold")

an send me the summary and hist(dep2$Liab)

Cheers,

Jarrod



On 16/12/2016 07:02, Jarrod Hadfield wrote:


Hi Chris,

I think ngen in threshbayes is not the number of full iterations (i.e. 
a full update of all parameters), but the number of full iterations 
multiplied by the number of nodes (2n-1). With n=600 species this 
means threshbayes has only really done about 8,000 iterations (i.e. 
about 1000X less than MCMCglmm). Simulations suggest threshbayes is 
about half as efficient per full iteration as MCMCglmm which means 
that it may have only collected 0.5*1132/1200 = 0.47 effective samples 
from the posterior. The very extreme value and the surprisingly tight 
credible intervals (+/-0.007) also suggest a problem.


*However*, the low effective sample size for the covariance in the 
MCMglmm model is also worrying given the length of chain, and may 
point to potential problems. Are egg-laying/live-bearing scattered 
throughout the tree, or do they tend to aggregate a lot? Can you send 
me the output from:


prior.dep<-=list(R=list(V=diag(1)*1e-15, fix=1), 
G=list(G1=list(V=diag(1), fix=1)))


dep<-MCMCglmm(parity~log.med.depth,
   random=~animal,
   rcov=~units,
   pedigree=shark.tree,
   reduced=TRUE,
   data=traits,
   prior=prior2,
   pr=TRUE,
   pl=TRUE,
   family="threshold")

summary(dep)

summary(glm(parity~log.med.depth, data=traits, 
family=binomial(link=probit)))


Cheers,

Jarrod



On 15/12/2016 20:59, Chris Mull wrote:

Hi All,
I am trying to look at the correlated evolution of traits using the
threshold model as implemented in phytools::threshBayes (Revell 2014) and
MCMCglmmRAM (Hadfield 2015). My understanding from Hadfield 2015 is that
the reduced animal models should yeild equivalent results, yet having run
both I am having trouble reconciling the results. I have a tree covering
~600 species of sharks. skates, and rays and am interested in testing for
the correlated evolution between reproductive mode (egg-laying and
live-bearing) with depth. When I test for this correlation using
phytools:threshBayes there is a clear result with a high correlation
coefficient (-0.986) as I would predict. When I run the same analysis using
MCMCglmmRAM I get a very different result, with a low degree of covariation
and CI crossing zero (-0.374; -3.638 - 3.163 95%CI). Both models are run
for 10,000,000 generations with the same bunr-in and sampling period. I
have looked at the trace plots for each model using coda and parameter
estimates and likelihodd . I'm not sure how to reconcile the differences in
these results and any advice would be greatly appreciated. I have appended
the code and outputs below.


###
#phytools::threshBayes#
###

X<-shark.data[c("parity","log.med.depth")]
X<-as.matrix(X)

#mcmc paramters (also see control options)
ngen<-1000
burnin<-0.2*ngen
sample<-500

thresh<-threshBayes(shark.tree,
X,
types=c("discrete","continuous"),
ngen=ngen,
control = list(sample=sample))

The return correlation is -0.986 (-0.99 - -0.976  95%HPD)


#
#MCMCglmm bivariate-gaussian#
#


prior2=list(R=list(V=diag(2)*1e-15, fix=1), G=list(G1=list(V=diag(2),

nu=0.002, fix=2)))

ellb.log.dep<-MCMCglmm(cbind(log.med.depth,parity)~trait-1,
   random=~us(trait):animal,
   rcov=~us(trait):units,
  

Re: [R-sig-phylo] Threshold models using threshBayes vs MCMCglmmRAM

2016-12-15 Thread Jarrod Hadfield

Hi Chris,

I think ngen in threshbayes is not the number of full iterations (i.e. a 
full update of all parameters), but the number of full iterations 
multiplied by the number of nodes (2n-1). With n=600 species this means 
threshbayes has only really done about 8,000 iterations (i.e. about 
1000X less than MCMCglmm). Simulations suggest threshbayes is about half 
as efficient per full iteration as MCMCglmm which means that it may have 
only collected 0.5*1132/1200 = 0.47 effective samples from the 
posterior. The very extreme value and the surprisingly tight credible 
intervals (+/-0.007) also suggest a problem.


*However*, the low effective sample size for the covariance in the 
MCMglmm model is also worrying given the length of chain, and may point 
to potential problems. Are egg-laying/live-bearing scattered throughout 
the tree, or do they tend to aggregate a lot? Can you send me the output 
from:


prior.dep<-=list(R=list(V=diag(1)*1e-15, fix=1), 
G=list(G1=list(V=diag(1), fix=1)))


dep<-MCMCglmm(parity~log.med.depth,
   random=~animal,
   rcov=~units,
   pedigree=shark.tree,
   reduced=TRUE,
   data=traits,
   prior=prior2,
   pr=TRUE,
   pl=TRUE,
   family="threshold")

summary(dep)

summary(glm(parity~log.med.depth, data=traits, 
family=binomial(link=probit)))


Cheers,

Jarrod



On 15/12/2016 20:59, Chris Mull wrote:

Hi All,
I am trying to look at the correlated evolution of traits using the
threshold model as implemented in phytools::threshBayes (Revell 2014) and
MCMCglmmRAM (Hadfield 2015). My understanding from Hadfield 2015 is that
the reduced animal models should yeild equivalent results, yet having run
both I am having trouble reconciling the results. I have a tree covering
~600 species of sharks. skates, and rays and am interested in testing for
the correlated evolution between reproductive mode (egg-laying and
live-bearing) with depth. When I test for this correlation using
phytools:threshBayes there is a clear result with a high correlation
coefficient (-0.986) as I would predict. When I run the same analysis using
MCMCglmmRAM I get a very different result, with a low degree of covariation
and CI crossing zero (-0.374; -3.638 - 3.163 95%CI). Both models are run
for 10,000,000 generations with the same bunr-in and sampling period. I
have looked at the trace plots for each model using coda and parameter
estimates and likelihodd . I'm not sure how to reconcile the differences in
these results and any advice would be greatly appreciated. I have appended
the code and outputs below.


###
#phytools::threshBayes#
###

X<-shark.data[c("parity","log.med.depth")]
X<-as.matrix(X)

#mcmc paramters (also see control options)
ngen<-1000
burnin<-0.2*ngen
sample<-500

thresh<-threshBayes(shark.tree,
X,
types=c("discrete","continuous"),
ngen=ngen,
control = list(sample=sample))

The return correlation is -0.986 (-0.99 - -0.976  95%HPD)


#
#MCMCglmm bivariate-gaussian#
#


prior2=list(R=list(V=diag(2)*1e-15, fix=1), G=list(G1=list(V=diag(2),

nu=0.002, fix=2)))

ellb.log.dep<-MCMCglmm(cbind(log.med.depth,parity)~trait-1,
   random=~us(trait):animal,
   rcov=~us(trait):units,
   pedigree=shark.tree,
   reduced=TRUE,
   data=traits,
   prior=prior2,
   pr=TRUE,
   pl=TRUE,
   family=c("gaussian", "threshold"),
   thin=500,
   burnin = 100,
   nitt = 1000)

summary(ellb.log.dep)

Iterations = 101:501
Thinning interval  = 500
Sample size  = 18000
DIC: 2930.751
G-structure:  ~us(trait):animal
 post.mean l-95% CI u-95% CI eff.samp
traitscale.depth:traitscale.depth.animal16.965   15.092   18.864
  18000
traitparity:traitscale.depth.animal -0.374   -3.6383.163
1132
traitscale.depth:traitparity.animal -0.374   -3.6383.163
1132
traitparity:traitparity.animal   1.0001.0001.000
  0
R-structure:  ~us(trait):units
post.mean l-95% CI u-95% CI eff.samp
traitscale.depth:traitscale.depth.units16.965   15.092   18.86418000
traitparity:traitscale.depth.units -0.374   -3.6383.163 1132
traitscale.depth:traitparity.units -0.374   -3.6383.163 1132
traitparity:traitparity.units   1.0001.0001.0000
Location effects: cbind(scale.depth, parity) ~ trait - 1
 post.mean l-95% CI u-95% CI eff.samp pMCMC

Re: [R-sig-phylo] Accounting for phylogeny in binary predictor, binary response data

2016-02-11 Thread Jarrod Hadfield

Hi Grace,

Sorry for not responding earlier. The prior for the fixed effects looks 
OK - it can be useful to use such priors in the case of complete separation.


Without seeing the data it is hard to diagnose why the mixing is poor. 
My guess is that a phylogenetic heritability (signal) close to one has 
high support, which slows the mixing of the chain down. Is this the 
case, what does


m1$VCV[, 1] / (rowSums(m1$VCV) + pi^2/3)

look like?

The parameter expanded prior although reasonably flat for heritabilities 
in the range  0.05-0.98 has spikes close to 0 and 1. If the data give 
support in these regions (either because the true value is close to one 
of these extremes, or because there is not much information so the 
likelihood surface is flat) then the posterior can get dragged up into them.


Two possible options are a) to switch to a chi-square prior:

G=list(V=1, nu=1000, alpha.mu=0, alpha.V=1)

Pierre de Villemereuil demonstrates that this can have better properties 
with a probit link model (he uses family="ordinal")


http://onlinelibrary.wiley.com/store/10./2041-210X.12011/asset/supinfo/mee312011-sup-0002-Appendix-B.pdf?v=1=223aa0df3256a23f694f67254cde03841149ae3e

With family="ordinal" the heritability is calculated as m1$VCV[, 1] / 
(rowSums(m1$VCV) + 1). However, I would switch to family="threshold" 
which is standard probit regression and the heritability is simply 
m1$VCV[, 1] / rowSums(m1$VCV). Mixing is generally better in the latter.


Option b) is to assume the heritability is one (rather than estimate 
it).  This can be done using MCMCglmmRAM, and example code can be found 
here.


http://onlinelibrary.wiley.com/store/10./2041-210X.12354/asset/supinfo/mee312354-sup-0003-AppendixS3.pdf?v=1=58d260cb8c9b14812b164806c5f0a9af68af2b23

Note that this is an alternative algorithm for Felsenstein's threshold 
model (which is a probit glmm with h2 fixed at one) and so could be 
fitted using other software.


Cheers,

Jarrod









On 10/02/2016 18:33, Jörg Albrecht wrote:

Dear Grace,

the problem may derive from your specification of the priors. Usually you don’t 
specify the prior for B in MCMCglmm. The problem may also be related to the 
size of your dataset. Estimation of effects can be difficult with binary data, 
when the dataset is small. Below is a small example from Jarrod Hadfield for 
binary regression that accounts for phylogeny.


tree <- rcoal(200)
x <- rnorm(200)
l <- rbv(tree, 1, nodes="TIPS") + x + rnorm(200)
y <- rbinom(200, 1, plogis(l))
dat <- data.frame(y = y, x = x, species = tree$tip.label)
prior1 <- list(R = list(V = 1, fix = 1),
G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 
1000)))
# residual variance fixed at 1.
Ainv <- inverseA(tree)$Ainv
m1 <- MCMCglmm(y ~ x, random = ~ species,
ginverse = list(species = Ainv),
family = "categorical",
prior = prior1, data = dat)
# fixed effects should be around zero and one (+/- monte carlo error)
summary(m1$Sol)
# phylogenetic ICC should be 1/(2 + pi^2/3) = 0.189 (+/- monte carlo error)
summary(m1$VCV[, 1] / (rowSums(m1$VCV) + pi^2/3))

Hope this helps,

Jörg

—
Jörg Albrecht, PhD
Postdoctoral researcher
Institute of Nature Conservation
Polish Academy of Sciences
Mickiewicza 33
31-120 Krakow, Poland
www.carpathianbear.pl <http://www.carpathianbear.pl/>
www.globeproject.pl <http://www.globeproject.pl/>
www.iop.krakow.pl <http://www.iop.krakow.pl/>

Am 10.02.2016 um 00:55 schrieb Grace Pold <grace.p...@gmail.com>:

Hello,

I have characterized a few hundred bacteria from two environments and want to know if the bacteria 
from one environment is more likely to show a trait than the bacteria isolated from the other 
environment. So my data is binary in both the independent and in the dependent variable: 
"environment 1?" yes/no, and "degrades carbon source?" yes/no. If I wasn’t 
accounting for phylogeny, I think I would use a Chi-Squared test. But I would like to account for 
phylogeny in my analysis, since some of the bacteria form clusters on my phylogenetic tree with 
members only from one environment.

I thought maybe I could use MCMCglmm to test this, and have been following the 
examples previously posted on r-sig-phylo and in the MCMCglmm course notes. 
However, my model either fails to converge even after millions of iterations, 
or the autocorrelation plot shows strong positive correlations at a range of 
lags. So I think either I cannot use MCMCglmm for this, or I am specifying my 
model wrong. Any pointers in the right direction would be greatly appreciated.

Here is my model:

prior.m2c.5 = list(B = list(mu = c(0, 0), V = diag(2) *(1 + pi^2/3)), R = 
list(V = 1, fix = 1), G = list(G1 = list(V = 1, nu = 1, alpha.mu = 0, alpha.V = 
1000)))

simplemcmcCMCv2_5<-MCMCglmm(carbon01~environment01, random=~animal, p

Re: [R-sig-phylo] A perfect storm: phylogenetic trees, random effects and zero-inflated binomial data

2015-10-14 Thread Jarrod Hadfield

Dear Diederik,

The lack of convergence is because the residual variance is  
non-identifiable with binary data but you have a very weak prior on  
it. You should fix the residual variance at something (I usually use 1):


prior.test<-list(R=list(V=1,fix=1), G=list(G1=list(V=1, nu=0.002),G2 =  
list(V=1, nu=0.002)))


You might also want to consider using "threshold" rather than  
"categorical". The model is similar but the first uses a probit link  
and the second a logit link, and the MCMC algorithm is more efficient  
for the former. I would also advise updating to version 2.22 as it now  
performs better for phylogenetic models.


You may still find that problems persist with rare-event data, but get  
back to the list if this is the case after fixing the prior.


It has been argued that the h2 (or lambda if you prefer) for  
categorical traits should be fixed at one rather than estimated. If  
you did want to do this (I am not convinced it is always a reasonable  
thing to do) then you can use MCMCglmmRAM  
(http://jarrod.bio.ed.ac.uk/MCMCglmmRAM_2.22.tar.gz).


Cheers,

Jarrod








Quoting Diederik Strubbe  on Wed, 14  
Oct 2015 16:18:28 +0200:



Dear all,



 A while ago, I was kindly advised to try MCMCglmm to investigate
invasion success of non-native species while accounting for phylogenetic
relatedness. I have managed to run some explanatory models but stumble
upon converge problems…



The data are (1) /introduction events/ of non-native species. There are
about 4.000 introduction events, but only about 40 have resulted in an
established population – so there is a low number of ‘events’.
Successful introductions are coded as “1”,  failure is “0”, (2) the
/phylogenetic tree/ is an ultrametric majority-rule consensus tree
(class “phylo”,  number of tips: 359, number of nodes: 298), (3)
/country/ in which species are introduced is included as a random
effect, (4) a set of continuous /explanatory variables/.



I have mainly explored univariate models (ie one explanatory variables
at a time), but including multiple explanatory variables also results in
converge problems. I use the following model, with continuous variable
‘CloseCentralInv’ as single explanatory variable.



prior.test<-list(R=list(V=1,nu=0.002), G=list(G1=list(V=1, nu=0.002),G2
= list(V=1, nu=0.002)))

test.phylo1 <- MCMCglmm(invasiveStatus ~ CloseCentralInv,

random = ~species + country,


ginverse=list(species=inv.mytree$Ainv),

nitt=100, thin=500, burnin=1,

family = "categorical",

data = data.trade,

prior = prior.test,

verbose = FALSE)



As far as I understand, the basic model output looks reasonable (here
,
1), although effective sample sizes for the random effects seem to be
small.



However, /Heidelberger and Welch's/ convergence diagnostic often passes
the stationary test, but not the Halfway mean test (example here
, 2).
/Gelman and Rubin/'s convergence diagnostic (calculated on two chains)
indicates potential scale reduction factors that are often well above 1
(example here
, 3). A
plot of Gelman and Rubin also does not clearly indicate after how many
iterations convergence is to be expected (example here, 4). I tried to
use Raftery and Lewis's diagnostic to estimate the number of necessary
iterations, but this invariably outputs “You need a sample size of at
least 3746 with these values of q, r and s”. Traces for this model run
can be accessed here
 (5).



I tried upping the number of iterations to 1.000.000 for a number of
variables, but this does not seem to ameliorate the convergence problems.



I can see the following causes: (1) I misspecified the model and it is
not doing what I think it should be doing, (2) I actually need much more
iterations, (3) I need to specify stronger priors, (4) I am trying to
get blood from stone (aka the data do not allow such an analysis).



I might add that if I look at the p-values and estimates obtained
through the (non-converging- MCMCglmm runs, the results are very much
similar to a ‘simpler’ glmm where phylogeny is accounted for by using a
nested random effect for taxon and genus.



Any suggestions on how to proceed are much appreciated.



Best wishes and thanks in advance,



Diederik





1.
https://www.dropbox.com/s/kffj8o3nf9nz0xn/summary%28test.phylo1%29.png?dl=0

2.   https://www.dropbox.com/s/t25y37slvkqyd8l/heidel.diag.png?dl=0

3.   https://www.dropbox.com/s/kbxrc73ep1kjli5/gelman.diag.png?dl=0

4.   

Re: [R-sig-phylo] Partitioning phylogentic and environment effect on trait distribution

2015-01-22 Thread Jarrod Hadfield

Hi,

You might try looking at:

A general and simple method for obtaining R2 from generalized linear  
mixed-effects models by Nakagawa  Schielzeth.


They detail some of the issues with comparing null versus full models,  
and it is a well presented paper.


I can't remember whether they deal with the fact that the standard  
estimator for the R^2 is upwardly biased, as is the variance explained  
by the fixed effects. It is easier to see this from a frequentist  
point of view:  imagine a covariate x with an estimated regression  
coefficient bhat. bhat = b + d where b is the true coefficient and d  
is the sampling deviation.


The variance explained by x is  VAR(x*b) = VAR(x)b^2 but  
E[VAR(x*bhat)] (where the expectation is taken over possible  
estimates, bhat) is
VAR(x)b^2+VAR(x)VAR(d). The estimated R^2 is upwardly biased by  
uncertainty in the fixed effects: VAR(x)VAR(d) is always positive. Its  
not that surprising: under least squares bhat is the value that  
explains most of the variance in the data: if b differs from bhat, b  
must explain less.


Cheers,

Jarrod

The derivation for the above is:

 E[VAR(x*bhat)] = E[VAR(x*(b+d))]
= E[VAR(x)(b+d)^2]
= E[VAR(x)(b^2+2bd+d^2)]
= VAR(x)(b^2+VAR(d))

d^2 = VAR(d) and E[2bd]=0 for unbiased estimators because E[d]=0.















Quoting Gustaf Granath gustaf.gran...@gmail.com on Thu, 22 Jan 2015  
13:17:58 +0100:



Hi
Va/(Va+Ve) is the proportion of variance explained by phylogeny  
*after* accounting for the fixed effects. To include the variance  
explained by the fixed effects in the denominator is a bit  
difficult, because you have to  assume something about the  
distribution of predictors. For example do you want the different  
habitat types to be equally represented, or at the frequency with  
which they were sampled?
I see. I thought I could just look at the change in Va with or  
without the fixed effect (full versus null model). The biological  
question is if the observed trait variation is mainly a result of  
phylogeny or the environment, hence the focus on variance explained  
by the different components. I must admit that im not following why  
the distribution of the predictor is important (or what kind of  
approach you are referring to), but I guess that frequency with  
which they were sampled is most appropriate for the question. This  
should be easier for a continuous predictor, right?


Cheers

Gustaf

On 2015-01-22 12:23, Jarrod Hadfield wrote:

Hi Gustaf,

In the model with just species the residual variation is  
measurement error/plasticity error, but could also include  
deviations from the assumed BM process. If you add species.ide that  
term captures deviations from the assumed BM process.


Va/(Va+Ve) is the proportion of variance explained by phylogeny  
*after* accounting for the fixed effects. To include the variance  
explained by the fixed effects in the denominator is a bit  
difficult, because you have to  assume something about the  
distribution of predictors. For example do you want the different  
habitat types to be equally represented, or at the frequency with  
which they were sampled?


Cheers,

Jarrod


Quoting Gustaf Granath gustaf.gran...@gmail.com on Thu, 22 Jan  
2015 11:11:02 +0100:



Jarrod,
Thanks for your input. Dropping species.ide result in a model that  
is estimated without any problems.
prior = list(R = list(V = 1, nu = 0), G = list(G1 = list(V = 1, nu  
= 1, alpha.mu = 0, alpha.V = 1000)))

m1.mcmc - MCMCglmm(y ~ habitat, random =~ species,
   data = traits, ginverse = list(species = Ainv),  
nitt = 70, thin=100, prior = prior)
It seems to be rather robust (not sensitive to priors) and Im not  
sure I understand why you would need 100s of species.


But how is this model related to the simpler approach I described  
in my first post?
The error term (Ve) is the within species error  
(measurement/plasticity error) and the species variance (Va) is  
the phylogenetic variance in my data. Lambda (Pagel's) is then  
Va/(Va+Ve).
(m1.mcmc$VCV[,species] / (m1.mcmc$VCV[,species] +  
m1.mcmc$VCV[,units]))

# BUT is this only correct under a model without habitat?

If habitat is included in the model the between species variation  
not explained by phylogeny or habitat is included in Ve because I  
dropped the speceis.ide, or am I missing something here?


If I run a model with and without habitat I can extract the amount  
of variation explained by habitat (differences in Ve = Vhab) and  
get a ballpark number on the relative contribution to the pattern  
we observe:

Va / (Va+Vhab+Ve) #phylo
Vhab / (Va+Vhab+Ve) #habitat
Ve / (Va+Vhab+Ve) #measurement/plasticity/local adaption  and  
other processes


Did I get that right or am I lost?

Gustaf


On 2015-01-22 04:54, Jarrod Hadfield wrote:

Hi Gustaf,

1/ You can ignore nhabitat: for some reason I thought you thought  
that there might be systematic differences between

Re: [R-sig-phylo] Partitioning phylogentic and environment effect on trait distribution

2015-01-22 Thread Jarrod Hadfield

Hi Gustaf,

In the model with just species the residual variation is measurement  
error/plasticity error, but could also include deviations from the  
assumed BM process. If you add species.ide that term captures  
deviations from the assumed BM process.


Va/(Va+Ve) is the proportion of variance explained by phylogeny  
*after* accounting for the fixed effects. To include the variance  
explained by the fixed effects in the denominator is a bit difficult,  
because you have to  assume something about the distribution of  
predictors. For example do you want the different habitat types to be  
equally represented, or at the frequency with which they were sampled?


Cheers,

Jarrod






Quoting Gustaf Granath gustaf.gran...@gmail.com on Thu, 22 Jan 2015  
11:11:02 +0100:



Jarrod,
Thanks for your input. Dropping species.ide result in a model that  
is estimated without any problems.
prior = list(R = list(V = 1, nu = 0), G = list(G1 = list(V = 1, nu =  
1, alpha.mu = 0, alpha.V = 1000)))

m1.mcmc - MCMCglmm(y ~ habitat, random =~ species,
data = traits, ginverse = list(species = Ainv),  
nitt = 70, thin=100, prior = prior)
It seems to be rather robust (not sensitive to priors) and Im not  
sure I understand why you would need 100s of species.


But how is this model related to the simpler approach I described in  
my first post?
The error term (Ve) is the within species error  
(measurement/plasticity error) and the species variance (Va) is the  
phylogenetic variance in my data. Lambda (Pagel's) is then Va/(Va+Ve).

(m1.mcmc$VCV[,species] / (m1.mcmc$VCV[,species] + m1.mcmc$VCV[,units]))
# BUT is this only correct under a model without habitat?

If habitat is included in the model the between species variation  
not explained by phylogeny or habitat is included in Ve because I  
dropped the speceis.ide, or am I missing something here?


If I run a model with and without habitat I can extract the amount  
of variation explained by habitat (differences in Ve = Vhab) and get  
a ballpark number on the relative contribution to the pattern we  
observe:

Va / (Va+Vhab+Ve) #phylo
Vhab / (Va+Vhab+Ve) #habitat
Ve / (Va+Vhab+Ve) #measurement/plasticity/local adaption  and  
other processes


Did I get that right or am I lost?

Gustaf


On 2015-01-22 04:54, Jarrod Hadfield wrote:

Hi Gustaf,

1/ You can ignore nhabitat: for some reason I thought you thought  
that there might be systematic differences between specialists and  
non-specialists.


2/ species and species.ide are identical, but species effects are  
structured by the phylogeny due to the ginverse specification.  
species.ide are treated as iid.


3/ With few species and few habitats, and very few species observed  
in multiple habitats, I think you will just have to go for the  
simple model you describe (perhaps even dropping species.ide):


m1.mcmc-MCMCglmm(trait~as.factor(habitat)+x, random =~  
species+species.ide, data=my_data, ginverse=list(species=Ainv))


Expect the variance components to be poorly estimated and prior  
sensitive. 100's of species are required to get even moderately  
precise estimates of the phylogenetic variance.


Cheers,

Jarrod



Quoting Gustaf Granath gustaf.gran...@gmail.com on Thu, 22 Jan  
2015 01:15:23 +0100:



Jarrod,
I tried this but Im not sure it works well for my problem and/or  
Im doing something wrong.


First, Im not sure what nhabitat is. A vector specifying the  
number of habitats that the species is found in? I.e.  
my_data$nhabitat may look like:
1,1,1,1,2,2,2,2, 2,2,2,2  if the first two species occurs in  
one and two habitats, respectively.


Second, species.ide. If it is the same as species, why not just: +  
species + species ?

Should it be a vector identical to species but just names species.ide?

So to my data, there are only three habitats (3 levels), thus Im  
not sure it works well modeling it as a random factor (and as you  
indicate we need a reasonable number of habitats). There are 2,  
2 and 11 species in each habitat, and only a few are found in  
multiple habitats. So Im really looking at simpler model because  
there are not enough data for complicated structures. With so few  
species things are pushed to the limit but my guess is that the  
main problem here is that habitat cannot be fitted as a random  
factor. Possible to model it as a fixed effect?


However, there are other predictors of interest as well,  
continuous variables. How would that be modeled then?

Like this (predictor=x)?
m1.mcmc-MCMCglmm(trait~ x, random =~ species+species.ide,  
data=my_data, ginverse=list(species=Ainv))


I would also like to thank Tony for recommending excellent papers  
and I didnt know about the pez package which looks promising.


Cheers

Gustaf


On 2015-01-21 16:26, Jarrod Hadfield wrote:

Dear Gustaf,

How many levels of `habitat' are there, and are they  
cross-classified with respect to species (i.e. are multiple  
species measured in the same habitat

Re: [R-sig-phylo] Partitioning phylogentic and environment effect on trait distribution

2015-01-21 Thread Jarrod Hadfield

Dear Gustaf,

How many levels of `habitat' are there, and are they cross-classified  
with respect to species (i.e. are multiple species measured in the  
same habitat)?


Assuming for now there are a reasonable number of habitats then the  
simplest model (without cross-classification) in asreml/MCMCglmm would  
be


Ainv-inverseA(my_tree)$Ainv

m1.asreml-asreml(trait~nhabitat, random =~  
habitat+giv(species)+species.ide, data=my_data,  
ginverse=list(species=sm2asreml(Ainv)))


m1.mcmc-MCMCglmm(trait~nhabitat, random =~  
habitat+species+species.ide, data=my_data, ginverse=list(species=Ainv))


nhabitat is the number of habitats a species is found in. species.ide  
is the same as species and models the between species variation not  
captured by the BM model. Note that this requires you using the  
individual data rather than the species means (which is to be  
preferred).


With cross-classification you may want to have the effects of the same  
habitat vary across species. There's many ways to do this, but for now  
lets assume i) that in general they could be positively correlated (if  
the trait increases for species 1 in habitat 1 then it is also likely  
to increase the trait in species 2) and ii) if species 1 and species 2  
are close relatives the habitat is likely to effect them in a more  
similar way:


m2.asreml-asreml(trait~nhabitat, random =~  
habitat+giv(species)+giv(species):habitat+species.ide, data=my_data,  
ginverse=list(species=sm2asreml(Ainv)))


m2.mcmc-MCMCglmm(trait~nhabitat, random =~  
habitat+species+species:habitat+species.ide, data=my_data,  
ginverse=list(species=Ainv))


There's many more possibilities of course, depending on the biology of  
the problem.


Cheers,

Jarrod



Quoting Gustaf Granath gustaf.gran...@gmail.com on Wed, 21 Jan 2015  
15:59:23 +0100:



Dear all,
Is it possible to quantify the influence of phylogeny and  
environment on a measured trait?


Say that trait y has been measured in several species located in  
different habitats. Some species have been measured in more than one  
habitat (less specialized species). Now I want to say something  
about how much variation in my trait that may be explained by  
phylogenetic structure (I have the phyl tree) and the habitat. I  
also have multiple measurements per species in the habitat(s) where  
is occurs (replicates) but lets ignore that for now and only use the  
means.


One way would be to calculate phylogenetic corrected trait means (as  
in Bloomberg's K).
Here it is easy to add rows/columns in the vcv.phylo(tree) for  
species that occurs in more than one habitat and it is assumed that  
they have the same postion in the tree (ie within-species  
correlation is = 1)
Residual sums of squares for the phylo effect can be calculated and  
the corrected trait means are used in a linear model to test the  
habitat effect.

R2 for both factors (phylo, habitat) can then be calculated.

Another method is to use gls(). But I dont know how to deal with  
species occurring in multiple habitats here (can you feed the gls  
with a manually contructed var-covariance matrix?). If I drop some  
species x habitat combinations (so species can only occur in one  
habitat) I can do:

bm.sp - corBrownian(phy = tree)
fit0 - gls(trait ~ 1, sp.means, method = ML)
fit1 - gls(trait ~ 1, sp.means, correlation = bm.sp, method = ML)
fit2 - gls(trait ~ Habitat, sp.means, correlation = bm.sp, method = ML)
anova(fit0, fit1, fit2) #check AIC

#pseudoR2 (not perfect, I know, but indicate somewhat the importance)
G2 - (-2*(logLik(fit0) - logLik(fit1)))[1]
n = nrow(sp.means)
r2ML.phyl - 1 - exp(-G2/n) #phyl effect

G2 - (-2*(logLik(fit0)-logLik(fit2)))[1]
r2ML.both - 1 - exp(-G2/n)
r2ML.hab - r2ML.both - r2ML.phyl #habitat effect

As mentioned, there are many assumptions here (BM etc) but I wonder  
if this sort of approach is in general valid?


Is there a standard way to deal with these kind of problems (e.g.  
same species in multiple groups)?


Is there a way to incorporate replicates? (manually add/rows columns  
in the phylogenetic varcovar-matrix?)
(this has been on the list before, MCMCglmm has been mentioned and  
maybe a more complete modeling approach is the way to go here)


Key papers?

Cheers

Gustaf

--
Gustaf Granath (PhD)
Post doc
Swedish University of Agricultural Sciences
Department of Aquatic Sciences and Assessment

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] WG: Re: Re: MCMCglmm for categorical data with more than 2 levels - prior specification?

2013-08-08 Thread Jarrod Hadfield

Hi,

The IJ prior (or posterior) implies that the variance in each  
probability is constant and that probabilities of different outcomes  
are mutually independent, conditional on the constraint that they must  
sum to one. To see why, let V be the covariance matrix of  
log-contrasts (either at the phylogenetic or residual level) then:


V[1,1] = VAR(LP_2-LP_1)
   = VAR(LP_2)+VAR(LP_3)-2COV(LP_2,LP_1)

and

V[1,2] = COV(LP_2-LP_1, LP_3-LP_1)
   = COV(LP_2, LP_3)-COV(LP_2,LP_1)-COV(LP_3,LP_1)+VAR(LP_1)

where LP_i = log(Pr(nominal[i])) from previous emails, and LP_1 is the  
log probability for the baseline category. If we would like to have a  
prior where VAR(LP_i) is constant (VAR(LP)) for all i, and  COV(LP_i,  
LP_j) = 0 for all i and j, then:


V[1,1] = 2*VAR(LP)

and

V[1,2] = VAR(LP)

so a sensible prior is proportional to an I+J matrix where I is the  
identity matrix and J a unit matrix (a matrix of all ones).


My guess is that the mixing/convergence problems are due to numerical  
issues if this is the same dataset that your other post (comp.gee not  
converging) refers to. Check out the latent variables as I have  
already suggested - do their absolute values exceed 25? If so you need  
to find out why (very high phylogenetic heritability, extreme category  
problems for the fixed effects etc.)


Cheers,

Jarrod










Quoting Sereina Graber sereina.gra...@gmx.ch on Thu, 8 Aug 2013  
15:02:20 +0200 (CEST):




Hi Jarrod, hi all,

I am still struggling with that MCMCglmm function:

First, in the course notes I have read that for some reason which
should come clearer later on in the text the IJ matrix is used for the
prior of the residuals and the random effects in the multinomial
model. Why especially this matrix?

Second, probably a very stupid questions: if the model did not
converge, you have to run it longer, so increase the number of
iterations, right? However, when I am increasing the number of
iterations (increased from 12,000 to 100,000, there are still trends
in the times series plots. What can I do then? what else might be the
problem here? And also related to that, in the last email you wrote
that there might be a problem du to my small effect sizes, however, it
also seems that those do not increase with increasing number of
iterations.

I am very thankful for some help.

Cheers,
Sereina

GESENDET: Freitag, 02. August 2013 um 14:54 Uhr
VON: Jarrod Hadfield j.hadfi...@ed.ac.uk
AN: Sereina Graber sereina.gra...@gmx.ch
CC: r-sig-phylo@r-project.org
BETREFF: Re: Aw: Re: [R-sig-phylo] WG: Re: Re: MCMCglmm for
categorical data with more than 2 levels - prior specification?
Hi,

They are the effect of the covariates on the probability of being in
the categories 2,3,4 versus category 1. Note that your effective
sample sizes are very small which means mixing is a problem and you
need to run it for longer. Numerical/Inferential problems can also
occur if the joint distribution of the predictors and the outcomes
results in `extreme categorical problems'. You then might want to
follow Gelman's advice on priors for fixed effects. See the function
gelman.prior.

Cheers,

Jarrod

Quoting Sereina Graber sereina.gra...@gmx.ch on Fri, 2 Aug 2013
14:48:44 +0200 (CEST):

Great, thanks a lot! Then I have one last question: How do I have to
interpret the following output of the location effects? the first
three lines I guess represent the intercepts of categories 2 to 4, but
how I should I interpret the rest having the two covariates lnBrain
(continuous) and binary (binary).

With the following model...

myMCMC.phyl- MCMCglmm(nominal ~ trait-1+ trait:lnBrain +
trait:binary, random=~us(trait):species, rcov = ~us(trait):units,
pedigree=bird.tree,
+ data = bird.data, family=categorical,
+ prior=Prior.phyl6)

...I got the following location effects:

Location effects: nominal ~ trait - 1 + trait:lnBrain + trait:binary
post.mean l-95% CI u-95% CI eff.samp pMCMC

traitnominal.2 5.59844 4.49565 6.90609 9.676 0.001
***
traitnominal.3 -4.12383 -5.58366 -2.65665 7.794 0.001
***
traitnominal.4 -1.70863 -2.86831 -0.38491 12.770 0.006
**
traitnominal.2:lnBrain -0.08244 -2.10570 1.57463 3.228 0.880

traitnominal.3:lnBrain -1.29069 -3.36790 1.08456 3.790 0.376

traitnominal.4:lnBrain -0.53814 -2.76265 1.67985 3.859 0.762

traitnominal.2:binary2 -9.59263 -16.21345 -3.88906 3.403 0.001
***
traitnominal.3:binary2 13.37745 9.26769 19.93064 4.247 0.001
***
traitnominal.4:binary2 8.61585 3.82747 15.54171 3.446 0.001
***
---

Best  thank you so much for your help!

GESENDET: Freitag, 02. August 2013 um 13:55 Uhr
VON: Jarrod Hadfield j.hadfi...@ed.ac.uk
AN: sereina.graber sereina.gra...@gmx.ch
CC: r-sig-phylo@r-project.org
BETREFF: Re: [R-sig-phylo] WG: Re: Aw: Re: MCMCglmm for categorical
data with more than 2 levels - prior specification?
Hi,

1.) There is no difference between the arguments pedigree=bird.tree
and ginverse = list(species=Ainv) where Ainv is defined by
Ainv=inverseA(bird.tree)$Ainv. The latter

Re: [R-sig-phylo] MCMCglmm for categorical data with more than 2 levels - prior specification?

2013-08-02 Thread Jarrod Hadfield

Hi Sereina,

You should not get that error message when you do not specify a prior  
- but if you do can you let me know.


For the prior you specified you get the error message because  
us(trait):units is specifying a 3x3 covariance matrix, and yet your  
prior, R=list(V=1,nu=0.002), is specifying a 1x1 matrix. V should be a  
3x3 matrix, but note that the residual covariance matrix with  
categorical data cannot be estimated from the data. For this reason  
most people would not fit a weak prior (i.e. nu=0.002) but fit a very  
strong prior (fixing it at some value a priori using fix=1 in the  
prior specification). The choice of residual covariance matrix is  
arbitrary - the results can always be expressed in a way that do not  
depend on the choice of residual covariance matrix (See the  
CourseNotes).


The fixed and random effect formulae are also a bit odd.  This type of  
model is essentially equivalent to a trivariate model where the three  
traits (on the latent scale) are the differences on a log scale  
between the probability of being in categories 2,3 or 4 compared to  
category 1:


 log(Pr(nominal[2]))-log(Pr(nominal[1]))
 log(Pr(nominal[3]))-log(Pr(nominal[1]))
 log(Pr(nominal[4]))-log(Pr(nominal[1]))

where nominal[1] is called the baseline category. You can change the  
baseline category by reordering the factor levels in nominal.


By having ~animal in the random formula you are assuming that a) the  
phylogenetic variance for each contrast is equal  and b) that the  
correlation between the phylogenetic effects is one. This may make  
sense in some models and with some types of base-line category, but  
not generally I think. us(trait):animal allows the phylogenetic  
variances to differ over the traits and for each pair of traits to  
have a unique phylogenetic correlation. There are also other variance  
structures that can be fitted that are somewhere between these two  
extremes.


For the same reason you probably want to have trait specific  
intercepts and trait specific regression coefficients for the  
covariates.  This can be achieved by having:


~ trait-1+trait:lnBrain + trait:binary.x

I remove the global intercept (-1) because I find the model output  
easier to interpret, but it is not necessary.


You need to be careful with this type of model on these type of data,  
because generally there is not much information from data on extant  
taxa about the parameters of comparative analyses, particularly when  
the data are categorical. This means that priors, even ones that  
appear innocuous such as flat priors, may have a substantial influence  
on the posterior. In addition, numerical problems may exist in  
categorical models when the posterior distribution for the  
phylogenetic intra-class correlations has support in regions close to  
one (either because the true value is close to one, or because the  
posterior distribution is very wide because the data are not very  
informative).  This can be checked by saving the latent variables  
(pL=TRUE in the call to MCMCglmm) and making sure that the absolute  
values of the latent variables do not regularly exceed 20. Lastly,  
mixing may be (very) poor so you may have to wait an inordinate amount  
of time to completely sample the posterior.


Cheers,

Jarrod





Doing this is fine: you can always rescale the model parameters post analysis



Quoting Sereina Graber sereina.gra...@gmx.ch on Fri, 2 Aug 2013  
10:17:58 +0200 (CEST):




Hi all,

I am doing a phylogenetic analysis using the MCMCglmm package with the
phylogenetic tree as the pedigree (Hadfield  Nakagawa 2010). I have a
categorical response variable (nominal) with more than 2 categories
(4 categories in total) and a continuous and a binary explanatory
variable. My model:

mod-MCMCglmm(nominal ~ lnBrain + binary.x, random= ~animal,
family=categorical,rcov=~us(trait):units, prior=prior4,
data=bird.data, pedigree=bird.tree)

Now there is always the following error message appearing if I do not
specify any priors, thus, using the default:

 Error in priorformat(if (NOpriorG) { :
  V is the wrong dimension for some prior$G/prior$R elements

However, I then tried different priors which didn`t work, because I
would have the wrong dimensions in the prior...can any one help me
with how I have to specifiy the priors correctly, what dimensions do I
need? My priors:

var2-cbind(c(1e+08,0.1,0.1), c(0.1,1e+08,0.1),c(0.1,0.1,1e+08))
prior4-list(R=list(V=1,nu=0.002), G=list(G1=list(V=1,
nu=0.002)),B=list(mu=rep(0,3), V=var2))

These priors lead to the error:

Error in priorformat(if (NOpriorG) { :
  V is the wrong dimension for some prior$G/prior$R elements

For any help I am very grateful.

Best


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable 

Re: [R-sig-phylo] MCMCglmm: G-structure R-Structure

2012-12-17 Thread Jarrod Hadfield

Hi Sam,

The terminology G and R structure is used widely, for example in  
ASreml  SAS and probably others. The G-structure is the covariance  
matrix of the random effects and the R-structure is the covariance  
matrix of the residuals. In your model you have one random term  
(animal)  and one residual term (the default - units).


You haven't posted your model specification only the summary, so I  
will presume that the animal term is linked to a phylogeny. There is a  
single random effect for each of the n species and they have  
correlation structure A. A is an nxn matrix representing the  
proportion of evolution time shared between each species. The  
covariance structure is Va*A where Va is a scalar variance estimated  
from the data.  The ... in the  prior:


G=list(G1=list(...))

should contain your specification for the prior on Va.

units is a factor with a unique level for each row of the data.frame.  
The default ~units therefore has the standard correlation structure I  
for the residuals. I is an identity matrix (ones on the diagonal  
zero's elsewhere) and represents the fact that we believe the  
residuals are independent and homoscedastic. If there is only one  
observation per species I is nxn also.  The covariance structure is  
Ve*I where Ve is a scalar variance estimated from the data.  The ...  
in the  prior:


R=list(...)

should contain your specification for the prior on Ve.

Together we have the prior

prior=list(R=R,G=G)

You have this bit of code (which I've modified a bit):

l = rbv(tree, Va, nodes=TIPS) + rnorm(100, 0, sqrt(Ve))

which generates random data with covariance structure  Va*A+Ve*I. Va=1  
and Ve=1 in your example and  Va/(Va+Ve)=0.5 is sometimes known as the  
phylogenetic heritability (Lynch 1991) or more commonly Pagel's lambda  
(Pagel 1999). Note that I've removed x from the code because it  
doesn't really make sense to have it there. More likely you want  
something like:


l = rbv(tree, Va, nodes=TIPS) + as.matrix(x)%*%beta + rnorm(100, 0,  
sqrt(Ve))


where beta is a vector of regression coefficients (beta=c(1, 1))  
determining the effect of being arboreal and being nocturnal  
respectively on energy use. The data have the same covariance  
structure as before conditional on the vector of means (x%*%beta).


However, the response in your analysis is not l but Energy. This was  
generated with as:


 50 + rnorm(100, 0, sqrt(Ve))

where Ve=20^2=400.

The summary shows that the posterior distribution returned by MCMCglmm  
has support for the true values (Va=0, Ve=400, beta1=0, beta2=0). For  
example Va is associated with the animal term in your model and so has  
a posterior mean of 2.186 and 95% credible interval of 0 - 6.493.


If you rerun the analysis with l rather than Energy

data = data.frame(l, x, animal=tree$tip.label)

prior=list(R=list(V=1,nu=0), G=list(G1=list(V=1, nu=1, alpha.mu=0,  
alpha.V=1000)))


# here I've used a flat *improper* prior for the Ve and a parameter  
expanded prior for Va. The Va prior is a scaled F-distribution with 1  
df for the numerator and denominator. Parameter expanded priors have  
some nice properties, but the main advantage of parameter expansion is  
algorithmic - it increases the efficiency of the MCMC algorithm when  
the posterior for Va contains values close to zero.


m2-MCMCglmm(l~arboreal+nocturnal, random=~animal, data=data,  
pedigree=tree, prior=prior)


The summary shows that the posterior distribution returned by MCMCglmm  
has support for the true values (Va=1, Ve=1, beta1=1, beta2=1).




summary(m2)


 Iterations = 3001:12991
 Thinning interval  = 10
 Sample size  = 1000

 DIC: 285.753

 G-structure:  ~animal

   post.mean l-95% CI u-95% CI eff.samp
animal  1.49   0.16963.445442.9

 R-structure:  ~units

  post.mean l-95% CI u-95% CI eff.samp
units0.8819   0.61871.208 1000

 Location effects: l ~ arboreal + nocturnal

post.mean l-95% CI u-95% CI eff.samp  pMCMC
(Intercept)   -1.0719  -2.3445   0.0800139.1  0.082 .
arboreal   1.3163   0.9545   1.6891   1000.0 0.001 ***
nocturnal  0.9639   0.5477   1.3512   1108.4 0.001 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Have you consulted the CourseNotes? I think not.

Cheers,

Jarrod


Quoting Sam Passmore passmore@gmail.com on Sat, 15 Dec 2012  
11:23:16 +1300:



Dear R-Sig-Phylo subscribers,



I am currently trying to perform some analysis using the MCMCglmm package,
however I am having trouble understanding some of the aspects of the
package and wondered if I could get your help?



I have tried to develop an example of my problem that I hope will make my
problems clear, but they are primarily associated around the meanings of
G-structure and R-structure. If anyone could give a broad definition of
what these parts of the analysis do and how they affect the results, that
would be most useful.



I have a list of species in a data frame with traits for each 

Re: [R-sig-phylo] Phylogenetic signal and PGLS

2012-11-29 Thread Jarrod Hadfield

Hi,

ASReml is another option, which uses REML. It takes 1/10th of a second  
on a 1000 tip phylogeny and is considerably more flexible.


fit-asreml(y~x,random=~giv(species),data=dat,ginverse=list(species=sm2asreml(Ainv)))

# with the data set up as:

ntips-1000
tree-rcoal(ntips)  # simulate tree
Ainv-inverseA(tree)$Ainv   # get sparse inverse of tree
x-rnorm(ntips) # simulate covariate
lambda-0.5
y-rbv(tree,lambda, nodes=TIPS)+rnorm(ntips,0, sqrt(1-lambda)) # simulate y

dat-data.frame(y=y,x=x, species=tree$tip.label)

you will need the inverseA and sm2asreml helper functions in MCMCglmm.


Quoting Liam J. Revell liam.rev...@umb.edu on Wed, 28 Nov 2012  
15:36:00 -0500:



Hi Antigoni.

The other responders are correct, but just to be a little more concrete:

# using ape  nlme
library(ape); library(nlme)
# assuming your data are contained in named vectors x  y
y-y[names(x)]
fit-gls(y~x,data.frame(x,y),correlation=corPagel(1,tree,fixed=FALSE),method=ML)
# assuming your data are in data frame X, with column names var1,
# var2, etc.; for example:
fit-gls(var1~var2+var3,X,correlation=corPagel(1,tree,fixed=FALSE),method=ML)

# using caper
library(caper)
# assuming your data are in named vectors x  y
y-y[names(x)]
X-data.frame(x,y,Species=names(x))
fit-pgls(y~x,comparative.data(tree,X,Species),lambda=ML)

# using phytools
library(phytools)
# assuming your data are in named vectors x  y
fit-phyl.resid(tree,x,y,method=lambda)

Although I am the author of phytools, I would say that the first two  
options should be preferred in general because they can be used with  
generic functions such as summary  residuals. (With the *possible*  
exception that phyl.resid seems to be a little bit faster on large  
trees. pgls in caper, for instance, took more than 4 min. to run a  
tree of 1000 tips on my computer.)


I hope this helps.

All the best, Liam


Liam J. Revell, Assistant Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 11/28/2012 12:02 PM, Antigoni Kaliontzopoulou wrote:

Hello everyone,

I am trying to implement Liam Revell's suggestion on the evaluation of
Pagel's lambda simultaneous to fitting PGLS to minimize the effects of
wrong model selection (OLS vs. PGLS) on species data (i.e. Revell 2010,
Methods in Ecology and Evolution 1: 319-329).

Does anyone know if this kind of simultaneous optimization of lambda and
PGLS parameters is presently implemented in R? The obvious place would
be phytools, but I could not find anything similar in that package.

I would be grateful if you can provide any suggestions.

Antigoni



___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] specifying priors in MCMCglmm - phylogenetic logistic regression

2012-10-03 Thread Jarrod Hadfield

Hi,

Quoting Margaret Evans mekev...@yahoo.com on Mon, 24 Sep 2012  
22:56:48 +0100 (BST):



Hello all,

I have a few questions concerning the specification of flat priors  
(on the probability scale) for a phylogenetic logistic regression in  
MCMCglmm.


1) First, I'd like to verify my understanding of the default
priors in MCMCglmm. Specifically, are they flat on the probability  
scale or not?  It

 seems like, from reading section 2.6 in the MCMCglmm course notes, that the
default priors for the fixed effects (intercept term, as well as
predictor[s]) are not flat on the probability scale. 


You are correct, the default priors for the fixed effects are flatish  
(as long as the predictors don't have very small variances) on the  
logit scale but not on the probability scale.  Generally this only  
becomes an issue when you have (near) complete separation. For  
categorical predictors this means there are very few 0's or 1's for  
some level of the fixed effects.  Gelman 2008 discusses priors for the  
fixed effects in logistic regression. In  
http://permalink.gmane.org/gmane.comp.lang.r.lme4.devel/8608 I provide  
a function (prior.scale) for obtaining a Gelman like prior which can  
be passed to B$V without having to rescale the predictors.  B$V needs  
to be scaled by the total variance + pi^2/3 with logit link.





2) I'm trying to use the alternative priors (flatter on the  
probability scale) suggested in section 2.6 of the MCMCglmm course  
notes. 

For a model (phylogenetic logistic regression) with two fixed effects
(no intercept), the prior is as shown in the course notes:

prior.flat - list(B = list(mu = c(0, 0), V = diag(2)*(1 + pi^2/3)), R
 = list(V = 1, fix=1), G = list(G1 = list(V = 1, nu = 0.002)))

For a model with a single fixed effect (no intercept), would it be:

prior.flat - list(B = list(mu = 0, V = (1 + pi^2/3)), R
 = list(V = 1, fix=1), G = list(G1 = list(V = 1, nu = 0.002)))

DID I GET THIS RIGHT???  I have a bad feeling that the V term in the  
B list is not right.


See above. In the special case of an intercept only, prior.scale will  
return a 1 which should be scaled by the variance (rcov variance +  
random effect variance + pi^2/3). This is what you have done except  
you have set the random effect variances to zero.


3) Removing the intercept term is recommended in section 2.6 of the  
MCMCglmm course notes, but this is justified in terms of  
separation...all (or nearly all) zeros or ones in one treatment  
class.  For one, my predictor is continuous, and for two I don't  
have this separation problem. Can anyone provide further insight  
into the rationale for removing the intercept? I should decide to  
include it or not based on the usual criteria (posterior  
distribution of the intercept term strongly overlaps zero, BIC-type  
information criterion??). If I exclude the intercept term, does that  
mean I am forcing the regression through the origin?


Removing the intercept is not necessary if you use prior.scale.  
Removing the intercept in the example of a two level fixed predictor  
(with levels A and B) results in two parameters that refer to the  
logit probability of being 1 in group A and the logit probability of  
being 1 in group B. The prior is roughly flat for these probabilities.  
If the intercept is not removed the default contrasts in R are such  
that the intercept is the logit probability of being 1 in group A and  
the other parameter is the *difference* in probability on the logit  
scale of being 1 in group B versus group A.  Using a prior from  
prior.scale will result in the same prior regardless of the  
parameterisation (i.e. whether the intercept is dropped or not).  
prior.scale will also deal with continuous predictors in the same way  
that Gelman recommends. Complete separation can be hard to detect when  
there are multiple and/or continuous predictors. In the simplest case  
no overlap in the predictor values associated with the zeros and the  
ones would result in complete separation.


Generally I find that it is the prior on the variance components  
rather than the fixed effects which have a greater influence on the  
results.   Pierre Villemereuil is about to publish a paper in Methods  
in Ecology and Evolution looking at prior specifications in binary  
models when estimating heritabilities from pedigreed populations which  
may be useful.


Cheers,

Jarrod








with thanks,
Margaret

[[alternative HTML version deleted]]






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] testing binomial characters

2012-08-30 Thread Jarrod Hadfield

Hi,

Regarding the blog and the feasibility of MCMCglmm for threshold models:

If y1 is binary and y2 is normal, then the univariate analysis would be:

Ainv-inverseA(tree)$Ainv

m1-MCMCglmm(y1~y2, random=~species,ginverse=list(species=Ainv),  
data=my.data, prior=my.prior, family=ordinal)


family=ordinal uses probit link (the original threshold model),  
family=categorical uses logit link.


The bivariate analysis would be

m2-MCMCglmm(cbind(y2,y1)~trait,  
random=~us(trait):species,rcov=us(trait):units,  
ginverse=list(species=Ainv), data=my.data, prior=prior,  
family=c(ordinal, gaussian))


here 2x2 covariance matrices are set up at the phylogenetic and  
residual level. If we have these matrices as Va and Ve, then the  
assumption of the univariate model is that Va[2,1]/Va[1,1] =  
Ve[2,1]/Ve[1,1].  Alternative covariance structures could be used, for  
example ~idh(trait):species sets Va[2,1]=0 and ~idh(trait):units sets  
Ve[2,1]=0. Other less useful covariance functions for this type of  
model might be ~idv(trait):species (Va[2,1]=0, Va[1,1]=Va[2,2])  and  
~species (Va[2,1]=Va[1,1]=Va[2,2]).


The prior is important in models with discrete characters, because the  
residual variation is not identifiable. I fix it a priori at one, for  
example in the univariate model:


prior$R=list(V=1, fix=1)

and the bivariate model

prior$R=list(V=diag(2), fix=2, nu=1.002)

although you need to be more careful here, because the residual  
variation in y2 is identifiable and may be sensitive to the prior  
(inverse gamma in this instance). Note the ordering of the response  
(y2,y1) so that fix=2 fixes Ve[2,2]=1 (the residual variance of the  
binary character).


The choice of fixing it at one is entirely arbitrary (most software  
chooses zero). Different values do change the estimates, but they can  
always be rescaled to what would have been estimated given another  
choice (see CourseNotes).


I see there was some discussion of what to do when y1 has more than 2  
states. family=ordinal takes any number of states which are taken to  
be ordered.  family=categorical takes any number of states which are  
not ordered.  categorical with probit link could be implemented  
(literally change plogis to pnorm at the 2/3 places it appears in the  
C code) but my guess is the answers will be indistinguishable in most  
cases.


There are some drawbacks to modelling dependencies through correlated  
overdispersion (residuals) for non-gaussian data. However, I think  
these drawbacks do not apply to discrete data where overdispersion  
does not exist.


With regards to the earlier posts on asymmetric transitions, if a  
probit link is used  pnorm(0,intercept, sqrt(1+Va+Ve)) is  
approximately q01/(q10+q01) from the MK models. I have not been  
successful at finding an equivalence for q10+q01.


Cheers,

Jarrod




Quoting Joe Felsenstein j...@gs.washington.edu on Wed, 29 Aug 2012  
12:41:48 -0700:




Theodore Garland Jr wrote:


Check this:

Ives, A. R., and T. Garland, Jr. 2010. Phylogenetic logistic  
regression for binary dependent variables. Systematic Biology  
59:9-26.


In addition, check this:

Felsenstein, J.  2012. A comparative method for both discrete and  
continuous characters using the threshold model. American Naturalist  
179: 145-156.


Joe

Joe Felsenstein  j...@gs.washington.edu
 Dept of Genome Sciences and Dept of Biology, Univ. of Washington,  
Box 5065, Seattle Wa 98195-5065


___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-17 Thread Jarrod Hadfield

Hi,

Thanks for the Allman  Rhodes paper, it is very nice. For me at least  
it confirms my suspicions, but made me realise that claims of  
asymmetric transition rates are only suspicious if you are unprepared  
to make some (strong?) assumptions. If anyone disagrees with what I  
have written below, then please tell me and I will try again to  
understand this stuff:


Identifiability is achieved because the pdf for the root state is the  
stationary distribution (denoted by sigma in Allman  Rhodes: see  
example 1).  This is, I believe, the default in newer versions of  
Mesquite, although in older versions the distribution is 0.5/0.5 as in  
ace.


If the pdf of the root state is defined by an additional parameter,  
this leaves a single parameter to describe the rate of transitions,  
and asymmetrical transition rates are non-identifiable.  It seems to  
me there is a choice to be made between a) assuming the same processes  
after the root held before the root and talk about asymmetric  
transition rates or b) do not make this assumption and then admit that  
the rates of transition from 0-1 and 1-0 are not separable. I don't  
think the data can be used to distinguish between these view points,  
and so its a matter of personal choice which interpretation/model is  
used.


Cheers,

Jarrod








Quoting Mark Holder mthol...@ku.edu on Thu, 16 Aug 2012 23:41:45 -0500:


Hi,

I agree that model testing between ARD vs MK models is going to be  
misleading when the process is really described by a threshold model  
(and sorry for ignoring that set of simulations by Jarrod  
previously; somehow I misfiled that email and didn't see it).


The threshold model has nice ways of dealing with correlations among  
characters.  However, when it is applied as the underlying model for  
a single binary character (as in Jarrod's sims), the threshold model  
is similar to the single-site version of the covarion model (Tuffley  
and Steel's version).


I don't think the models are identical, but they are quite similar.  
I suspect that if you generated a data set under one of the models,  
it would be quite hard to determine which was the generating model.  
Instead of just having a an on and off state (as in the covarion  
model), the threshold model has a continuum (the further the  
underlying continuous trait is from boundary, the more off the  
observable binary trait is).  Allman and Rhodes (2009, ref below)  
proved some results on the identifiability of generalizations of  
covarion processes. They considered models with more hidden rate  
categories (not just rate of zero and an rate of evolution when in  
the on state). I believe that their results were that the number  
of hidden rate categories that you can identify cannot exceed the  
number of observable states. So it may be hard to get much richer  
than the Tuffley+Steel covarion when you have a binary character.


Which is a long way of saying that, it might be worth looking at the  
covarion model variants for the types of data that Jarrod is  
interested in.  Implementations of the covarion model for two states  
is quite fast and tractable. Testing Mk+covarion vs ARD+covarion may  
indeed be a more robust way of

detecting asymmetry in rates of character transitions compared to Mk vs ARD.


Thanks for pointing out the Boettiger et al paper, Matt.


all the best,
Mark


[1]	E. S. Allman and J. A. Rhodes, “The Identifiability of Covarion  
Models in Phylogenetics,” IEEE/ACM Transactions on Computational  
Biology and Bioinformatics, vol. 6, no. 1, pp. 76–88, Jan. 2009.





On Aug 16, 2012, at 10:07 PM, Matt Pennell wrote:


correction: the last sentence should have read

I wonder how that would work in this case. I think these are  
important questions going forward.


On Thu, Aug 16, 2012 at 11:00 PM, Matt Pennell mwpenn...@gmail.com wrote:
Hey all,

This has been a really fantastic discussion. Mark, you make some  
really excellent points in response to my earlier comments. I think  
you are correct in this.


The question that arises out of Jarrod and Dan's simulations (which  
I have just run) is whether a model selection criteria would be  
able to distinguish MK from the threshold model that Felsenstein  
(and Wright before him) put forth? And how do we best assess model  
adequacy? Carl Boettiger and company (2012: Evolution) suggested a  
Phylogenetic Monte Carlo approach for continuous characters. I  
wonder how that would before  I think these are important questions  
going forward.


thanks again,
matt



On Thu, Aug 16, 2012 at 10:43 PM, Dan Rabosky drabo...@umich.edu wrote:

Hi all-

A couple of points. I am actually less concerned about the Type I  
error rates I gave in that previous message for the equal rates  
markov process, even though I think they are real (e.g., I can  
corroborate them using Diversitree). I don't think it is an issue  
of ascertainment bias, but I think Mark may be right about the LRT  
being 

Re: [R-sig-phylo] asymmetric transitions

2012-08-17 Thread Jarrod Hadfield
 the parts of the  
tree that happen to have high rates of change and the parts of the  
tree that have low rates of change (since the ARD assumes that there  
is a constant rate of change, and all apparent difference is rate  
are sampling error).


The simulated data should show close the unbiased transitions in the  
fast changing parts of the tree, so Mk+covarion should do a better  
job than ARD.



Mk+covarion vs ARD+covarion
==
Both models should do a good job of not getting distracted by large  
expanses of the tree that are fixed (they won't attribute this as  
strong evidence in favor of that character).  If there does appear  
to be a strong bias in the parts of the tree with lots of changes,  
then ARD+covarion will win. Mk+covarion has fewer parameters. So it  
should win unless there is strong evidence for asymmetry in the fast  
parts of the tree.






On Aug 17, 2012, at 6:29 AM, Dan Rabosky wrote:



Hi folks-

I still think there is a difference between (i) parameter  
identifiability, which may or may not be a problem here, and (ii)  
strong support for the wrong model, which clearly appears to be  
occurring here (e.g., Type I error rates  0.75). I don't think  
non-identifiability of a parameter implies that you'll get  
massively inflated Type I error rates for models that include the  
parameter.


Also, the root state (and assumptions regarding it) don't seem to  
drive the pattern - you can set the root to 0 under the threshold  
model (e.g., both states equiprobable at root) and you recover the  
same strong bias - Jarrod's threshold example set the root to -1,  
you can verify that the problem holds with equiprobable root  
states). At this point I guess I'd be more concerned about model  
misspecification and its diagnosis (when modeling discrete  
characters) than about identifiability per se, though perhaps I am  
not thinking about this correctly. Distinguishing MK/ARD from  
threshold may be difficult, but there is clearly signal of these  
processes in the data: when you fit MK/ARD to data simulated under  
a threshold model, you get strong support for ARD - but not when  
you fit those same models to data generated under MK. So - there is  
clearly something in the data that is sufficiently informative  
about the processes such that we observe high error rates.


Mark, thanks for pointing out the relationship between the  
threshold model and the single-site covarion model.


Cheers,
~Dan






On Aug 17, 2012, at 6:31 AM, Jarrod Hadfield wrote:


Hi,

Thanks for the Allman  Rhodes paper, it is very nice. For me at  
least it confirms my suspicions, but made me realise that claims  
of asymmetric transition rates are only suspicious if you are  
unprepared to make some (strong?) assumptions. If anyone disagrees  
with what I have written below, then please tell me and I will try  
again to understand this stuff:


Identifiability is achieved because the pdf for the root state is  
the stationary distribution (denoted by sigma in Allman  Rhodes:  
see example 1).  This is, I believe, the default in newer versions  
of Mesquite, although in older versions the distribution is  
0.5/0.5 as in ace.


If the pdf of the root state is defined by an additional  
parameter, this leaves a single parameter to describe the rate of  
transitions, and asymmetrical transition rates are  
non-identifiable.  It seems to me there is a choice to be made  
between a) assuming the same processes after the root held before  
the root and talk about asymmetric transition rates or b) do not  
make this assumption and then admit that the rates of transition  
from 0-1 and 1-0 are not separable. I don't think the data can  
be used to distinguish between these view points, and so its a  
matter of personal choice which interpretation/model is used.


Cheers,

Jarrod








Quoting Mark Holder mthol...@ku.edu on Thu, 16 Aug 2012 23:41:45 -0500:


Hi,

I agree that model testing between ARD vs MK models is going to  
be misleading when the process is really described by a threshold  
model (and sorry for ignoring that set of simulations by Jarrod  
previously; somehow I misfiled that email and didn't see it).


The threshold model has nice ways of dealing with correlations  
among characters.  However, when it is applied as the underlying  
model for a single binary character (as in Jarrod's sims), the  
threshold model is similar to the single-site version of the  
covarion model (Tuffley and Steel's version).


I don't think the models are identical, but they are quite  
similar. I suspect that if you generated a data set under one of  
the models, it would be quite hard to determine which was the  
generating model. Instead of just having a an on and off  
state (as in the covarion model), the threshold model has a  
continuum (the further the underlying continuous trait is from  
boundary, the more off the observable binary trait is).  Allman  
and Rhodes (2009, ref below

[R-sig-phylo] asymmetric transitions

2012-08-16 Thread Jarrod Hadfield

Hi,

I have been helping someone with some analyses and came across some  
routines to estimate asymmetric transition rates between discrete  
characters. This surprised me because its fairly straightforward to  
prove that asymmetric transition rates cannot be identified using data  
collected on the tips of a phylogeny. However when I run these  
routines (e.g. ace) likelihood ratio tests often suggest strong  
support for asymmetric rates. This seems to arise because there is an  
implicit (and unjustified) prior point mass on the ancestral state  
*probability*. If anyone could point me to reference that states this  
that would be great? I really don't want to be saying we have support  
for processes that we need a fossil record, not just a phylogeny, to  
understand.


Cheers,

Jarrod



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-16 Thread Jarrod Hadfield

Hi,

I have had a few replies off-list which have made me try and clarify  
what I mean.  I think the distinction needs to be made between two  
types of probability: the probability  that an outcome is 0 or 1 Pr(y|  
\theta) and the probability density of \theta, Pr(\theta | \gamma),  
indexed by parameter(s) \gamma.  It seems to me that in order to make  
the problem identifiable an informative prior (or an assumption) is  
required for the root state.  It seems to me that the strong prior  
Pr(\theta=0.5|\gamma) =1 is used, and then justified because int  
Pr(y=0| \theta)Pr(theta)dtheta=int Pr(y=1| \theta)Pr(theta)dtheta=0.5.  
A non-informative prior distribution for \theta could be U(0,1). This  
also induces a prior on y of the same form, int Pr(y=0|  
\theta)Pr(theta)dtheta=int Pr(y=1| \theta)Pr(theta)dtheta=0.5, but  
that is not the main motivation for choosing U(0,1).


For example, is this not worrying:

library(ape)
n-100
tree-rcoal(n) # random tree
y-rbinom(n, 1, 0.2)  # random data unconnected to the tree
m1-ace(y, tree, type = d, model=SYM)
m2-ace(y, tree, type = d, model=ARD)
anova(m1, m2) # asymmetric evolutionary transition rates strongly  
supported


y-rbinom(n, 1, 0.5)  # random data unconnected to the tree but p=0.5
m1-ace(y, tree, type = d, model=SYM)
m2-ace(y, tree, type = d, model=ARD)
anova(m1, m2) # asymmetric evolutionary transition not supported

Cheers,

Jarrod






Quoting Jarrod Hadfield j.hadfi...@ed.ac.uk on Thu, 16 Aug 2012  
12:30:30 +0100:



Hi,

I have been helping someone with some analyses and came across some  
routines to estimate asymmetric transition rates between discrete  
characters. This surprised me because its fairly straightforward to  
prove that asymmetric transition rates cannot be identified using  
data collected on the tips of a phylogeny. However when I run these  
routines (e.g. ace) likelihood ratio tests often suggest strong  
support for asymmetric rates. This seems to arise because there is  
an implicit (and unjustified) prior point mass on the ancestral  
state *probability*. If anyone could point me to reference that  
states this that would be great? I really don't want to be saying we  
have support for processes that we need a fossil record, not just a  
phylogeny, to understand.


Cheers,

Jarrod



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo






--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] asymmetric transitions

2012-08-16 Thread Jarrod Hadfield

Hi,

I can see that there is information when the rates depend on  
speciation, because there is variation in the number of speciation  
events the species have experienced in their evolutionary history  
(from the root). However, if it is a purely time based process there  
is no variation in time across taxa so its hard to see where the  
information is coming from.


The following is a symmetric phylogenetically explicit process but the  
test is significant at alpha=0.05 ~75% of the time (it may be higher:  
the lower likelihood for the ARD vs SYM models in some cases suggests  
an algorithmic problem):


library(MCMCglmm)
n-100
tree-rcoal(n)
root-c(-1)
nu-rbv(tree, 1, nodes=TIPS)+root  # generate BM evolution of  
probability of being 1 on latent scale


y-rbinom(n, 1, plogis(nu))

m1-ace(y, tree, type = d, model=SYM)
m2-ace(y, tree, type = d, model=ARD)
anova(m1, m2)

Making the root further from 0 (0.5 on the probability scale) gives  
higher type I error rate, up to ~100%


Thanks for the references -  I'll try and make sense of it all!

Cheers,

Jarrod





Quoting Matt Pennell mwpenn...@gmail.com on Thu, 16 Aug 2012 11:04:48 -0400:


Jarrod and Dan,

While I see what Dan is saying and I agree that evaluating this with
non-phylogenetic data is not entirely useful, I think you have stumbled
upon a known issue but one that is not widely appreciated.

While the MK model is a useful model for discrete characters in many ways,
it may be inappropriate for such a test. If you assume that the root is
equally likely to be in either state with a lot of tips (i.e. approaching
the limit), the ML estimate for the ratio of q01 (transition rate from 0 to
1) to q10 will converge on the ratio between number of tips in state 1 to
number of tips in state 0. So if you have a tree where most of the tips are
in state 1 then you will get support for the asymmetric change model as
this best explains the data. It is a very weird problem and perhaps I am
incorrect in this.

Regardless, this result does not hold if the discrete character is modeled
simulataneously with the branching process of the phylogeny (i.e. BiSSE;
Maddison et al. 2007).

So, in summation, I mostly agree with you. But this is not shocking
behavior of the model. If you are interested, a Bayesian implementation of
the Mk model can be found in the package diversitree in the function make.mk
.

again, i could be off base here. curious to see what others think?

matt

On Thu, Aug 16, 2012 at 10:58 AM, Dan Rabosky drabo...@umich.edu wrote:



HI Jarrod-

It isn't immediately obvious to me why the exercise below reflects
something problematic. In the first scenario, you have a random binary
state but with strong differences in frequency. Because there is
effectively no phylogenetic signal (as data are simply random), this
suggests a high rate of transition between states. I think that such
asymmetry in frequencies would be highly unlikely under a symmetric model
of character change regardless of the root state. I think it is useful to
think about whether a symmetric process could have given rise to a truly
random distribution of tip data with strong frequency differences - my
intuition is that this is highly unlikely. However, I would be happy to
know what others think.

Cheers,
~Dan


On Aug 16, 2012, at 10:09 AM, Jarrod Hadfield wrote:

 Hi,

 I have had a few replies off-list which have made me try and clarify
what I mean.  I think the distinction needs to be made between two types of
probability: the probability  that an outcome is 0 or 1 Pr(y| \theta) and
the probability density of \theta, Pr(\theta | \gamma), indexed by
parameter(s) \gamma.  It seems to me that in order to make the problem
identifiable an informative prior (or an assumption) is required for the
root state.  It seems to me that the strong prior Pr(\theta=0.5|\gamma) =1
is used, and then justified because int Pr(y=0| \theta)Pr(theta)dtheta=int
Pr(y=1| \theta)Pr(theta)dtheta=0.5. A non-informative prior distribution
for \theta could be U(0,1). This also induces a prior on y of the same
form, int Pr(y=0| \theta)Pr(theta)dtheta=int Pr(y=1|
\theta)Pr(theta)dtheta=0.5, but that is not the main motivation for
choosing U(0,1).

 For example, is this not worrying:

 library(ape)
 n-100
 tree-rcoal(n) # random tree
 y-rbinom(n, 1, 0.2)  # random data unconnected to the tree
 m1-ace(y, tree, type = d, model=SYM)
 m2-ace(y, tree, type = d, model=ARD)
 anova(m1, m2) # asymmetric evolutionary transition rates strongly
supported

 y-rbinom(n, 1, 0.5)  # random data unconnected to the tree but p=0.5
 m1-ace(y, tree, type = d, model=SYM)
 m2-ace(y, tree, type = d, model=ARD)
 anova(m1, m2) # asymmetric evolutionary transition not supported

 Cheers,

 Jarrod






 Quoting Jarrod Hadfield j.hadfi...@ed.ac.uk on Thu, 16 Aug 2012
12:30:30 +0100:

 Hi,

 I have been helping someone with some analyses and came across some
routines to estimate asymmetric transition rates between discrete

Re: [R-sig-phylo] [R-sig-ME] Phylogenetic meta-analysis and setting animal variable in MCMCglmm

2009-09-18 Thread Jarrod Hadfield

Dear Wayne,

This is my fault. With phylogenies the ancestral nodes are treated as  
missing data and so I set their measurement error to an arbitrary  
value. The code for working out how many new measurement errors  
there are  was incorrect.


L98 of MCMCglmm.R should read

mev-c(mev, rep(1, dim(missing.combinations)[1]))

not

mev-c(mev, rep(1, length(missing.combinations)))

I'll change this is in the next version.  In the mean time there are  
two work arounds that should give exactly the same results:


A)

specify nodes=TIPS in the MCMCglmm function. This avoids augmenting  
with internal nodes, but can be much slower because the correlation  
structure is no longer sparse. The mixing properties can be better.


B)



Include the random effect

idh(sqrt(mev)):units  or idh(sqrt(RGR_meanVAR)):units in your case.

and set the variance for this term to 1:

prior$G$G2-list(V=1, n=0.002, fix=1)

 This is equivalent because the random design matrix Z is diagonal  
matrix with the standard errors on the diagonal.  vZZ' defines the  
expected covariance structure of the measurement errors, and since v=1  
this is equal to independent measurement errors with variance equal o  
mev.


Cheers,

Jarrod






On 18 Sep 2009, at 09:39, Dawson Wayne wrote:


Dear R-Sig-Me and R-Sig-Phylo users,

I have recently started using MCMCglmm with the aim of conducting  
phylogenetic meta-analysis. I have managed to run a basic model  
using MCMCglmm(), with weak uninformative priors specified, but I am  
struggling to work out how to incorporate a phylogeny. In  
particular, I am having problems setting the animal variable.

So, here is my set-up:

I have 123 species (plants), and my response variable is mean  
relative growth rate (RGR_mean), with associated variances  
(RGR_meanVAR). I want to see if the growth rates are related to  
plant invasiveness, which is (for the moment) the number of  
references in the global compendium of weeds (gcwrefs). I  
constructed a tree (tree)using phylomatic, which has some  
polytomies, and it is non-ultrametric.


This was my session:

sessionInfo()
R version 2.9.1 (2009-06-26)
i386-pc-mingw32

locale:
LC_COLLATE=English_United Kingdom.1252;LC_CTYPE=English_United  
Kingdom.1252;LC_MONETARY=English_United Kingdom. 
1252;LC_NUMERIC=C;LC_TIME=English_United Kingdom.1252


attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] MCMCglmm_1.11  gtools_2.6.1   combinat_0.0-6  
orthopolynom_1.0-1
[5] polynom_1.3-6  pscl_1.03  mvtnorm_0.9-7   
coda_0.13-4
[9] Matrix_0.999375-30 lattice_0.17-25tensorA_0.31
corpcor_1.5.2

[13] MASS_7.2-48ape_2.3-2

loaded via a namespace (and not attached):
[1] gee_4.13-14 grid_2.9.1  nlme_3.1-93

#reading the data file and the tree

rgrgrime-read.table(rgr_grime.txt,header=T)
rgr2-subset(rgrgrime,gcwrefs!=0)
attach(rgr2)

tree-read.tree(rgr_testtree.txt)
tree2- 
drop 
.tip 
(tree 
,c 
(Dryas_octopetala 
,Helianthemum_chamaecistus 
,Helictotrichon_pratense 
,Zerna_erecta,Picea_nigra,Pinus_sylvestris))


Some species had to be removed before analysis, but the end no. was  
123, and the data-file was in the same order as the tree tips.


So, the basic meta-analysis was (with default for burnin and no. of  
simulations)-


prior = list (R = list (V = 1,n = 1, fix = 1),G = list (G1 = list  
( V = 1, n = 1)))





(using a weak prior as in the vignette tutorial- I am uncertain as  
to how I might change the priors to something more appropriate to my  
data, any recommendations for further reading on this would be great)-


model-MCMCglmm (RGR_mean ~ sqrt (gcwrefs), random = ~ species_name,  
mev = RGR_meanVAR, prior = prior, data = rgr2)


This model converges, I've checked the autocorr statistics, and the  
posterior distributions, and they seem ok.


Then I want to add phylogeny in the pedigree= argument, and need  
to define the animal variable. In the MCMCglmm vignette, Jarrod says  
the animal variable will always be associated with the id levels in  
the first column of the pedigree table. But what will they be  
associated with if you pass a phylogeny through the pedigree  
argument? Logically, I thought I would make animal equal the  
phylogeny tip names-


rgr2$animal-tree2$tip

Then I tried the model-

model2-MCMCglmm(RGR_mean ~sqrt(gcwrefs), random = ~animal, mev =  
RGR_meanVAR, pedigree = tree2, prior = prior, data = rgr2, scale=F)


But I get the following error message-

Error in `$-.data.frame`(`*tmp*`, MCMC_mev, value =  
c(0.223606797749979,  :

 replacement has 252 rows, data has 188
In addition: Warning message:
In MCMCglmm(RGR_mean ~ sqrt(gcwrefs), random = ~animal, mev =  
RGR_meanVAR,  :
 some combinations in animal do not exist and 64 missing records  
have been generated


So, has anyone got any idea what I'm doing wrong here? Apologies if  
I'm being dumb, but I did only start using MCMCglmm yesterday, and I  
can't find