Re: [R-sig-phylo] phytools densityMap tree paint back to binary state

2017-02-25 Thread marko . djurakic

Dear Liam,
thank you very much for the response! This is really helpful for me, 
especially the part related to the appropriateness of using this 
approach. But it introduces more questions than solutions :) Regardless 
of that, I very appreciate your efforts to prepare and distribute posts 
on the phytools blog! Thank you for that.


You raised caveats regarding the approach I asked, but now I am puzzled 
what to do with the uncertainty of tips in the phylogeny...


My main aim is to compare the evolution of a continuous trait in regard 
to two regimes (aquatic and terrestrial). However, defining these two 
states is not a straightforward task since some species tend to be 
aquatic and terrestrial at the same time with tendencies towards one or 
the other state. All of them can be considered as "semi-aquatic" (but 
"semi-terrestrial" is ok too I'd say...). Likewise, few species lack 
information about habitat preferences. So, I wanted to include such 
uncertainty into the analysis. I am not able to make e.g. 100 maps and 
use each map as an input in model fitting (we did that way in the course 
in Barcelona in 2016 where you was the instructor) because I am doing an 
exhausting multivariate model fitting and computation is super long (+ 
failed approaches I performed as I am beginner)...
Nevertheless, I came to the idea to include uncertainty of tip states 
into stochastic mapping, obtain a single tree and use it for further 
analysis.
Though there is no explanation in the phytools package, I suspected that 
if one does following will get PP of tips being in particular state:

mtrees <- [set of stochastic maps]
XX <- describe.simmap (mtrees)
XX$tips ## this will give PP of each tip being in a particular state

If this is correct, then the stochastic mapping can be used not just to 
estimate ancestral states, but also PP of tips given a model and data. 
This involves some circularity, but otherwise, I do not know what are 
the values of XX$tips and how we can use/interpret them...Please, can 
you provide some comment for this?


Note that I set priors for 3 states, but obj <- densityMap (mtrees) 
returns a tree where tips are assigned either as "aquatic" or 
"terrestrial". So obj tree has two states what I actually needed, though 
I am not sure if this is the correct way of thinking... The distribution 
of two states is extremely similar to that if one would assign 
"semiaquatic" as "aquatic". However, there are few species estimated as 
"terrestrial" even I set the greatest prior to "semi-aquatic" state. I 
believe that this approach (if the rate of character evolution is 
relatively low as you pointed out) can be used as justification why 
particular tip are assigned to certain states. Imagine that one aims to 
test a bunch of models of continuous trait evolution allowing change of 
mode and tempo across two discrete ecological states, but ecological 
states for some tips are not certain or are unknown. Should we exclude 
species with uncertain states? Or, should we still guess a value of 
their states?


Please, can you provide some reproducible code that compares if marginal 
posterior probabilities and the joint likelihood of states are 
more-or-less similar (case when the rate of character evolution is 
relatively low as you pointed out) and if they are, how to obtain 
binary-colored tree after densityMap function? I tried later but I 
failed to accomplish it...


I am sorry for the long email.
Marko

On 2017-02-25 01:03, Liam J. Revell wrote:

Hi Marko.

This is possible to do, of course. We could just traverse the tree,
find the state with the highest posterior probability at each node,
and paint the edge by that state using phytools::paintBranches. We
could even (with a bit more difficult) traverse our "densityMap"
object and find the state with the highest PP at the midpoint of each
edge.

However, I might counsel against it for the following reasons.

Firstly, doing so (that is, identifying a set of values at nodes with
highest posterior probability at each node) ignores the substantial
uncertainty that can be associated with ancestral character
estimation. For instance, if we imagine that a set of internal nodes
all have posterior probability of state "A" more or less equal to but
slightly greater than 0.5 (for a binary character), we would
'reconstruct' all of these nodes to be in state "A" - but this would
actually be virtually meaningless with respect to our confidence that
these nodes are indeed in state "A".

Secondly, to the extent that we *do* want one set of states for all
internal nodes in the tree, a reasonable choice for these states would
be those that maximized the probability of the data (that is, the
Maximum Likelihood states). Unfortunately, these are not guaranteed to
be the states that have the highest marginal posterior probability at
all the nodes of the tree. (This is explained in various places,
including in the book 'Computational Molecular Evolution' by Yang.)

On the o

Re: [R-sig-phylo] phytools densityMap tree paint back to binary state

2017-02-24 Thread Liam J. Revell

Hi Marko.

This is possible to do, of course. We could just traverse the tree, find 
the state with the highest posterior probability at each node, and paint 
the edge by that state using phytools::paintBranches. We could even 
(with a bit more difficult) traverse our "densityMap" object and find 
the state with the highest PP at the midpoint of each edge.


However, I might counsel against it for the following reasons.

Firstly, doing so (that is, identifying a set of values at nodes with 
highest posterior probability at each node) ignores the substantial 
uncertainty that can be associated with ancestral character estimation. 
For instance, if we imagine that a set of internal nodes all have 
posterior probability of state "A" more or less equal to but slightly 
greater than 0.5 (for a binary character), we would 'reconstruct' all of 
these nodes to be in state "A" - but this would actually be virtually 
meaningless with respect to our confidence that these nodes are indeed 
in state "A".


Secondly, to the extent that we *do* want one set of states for all 
internal nodes in the tree, a reasonable choice for these states would 
be those that maximized the probability of the data (that is, the 
Maximum Likelihood states). Unfortunately, these are not guaranteed to 
be the states that have the highest marginal posterior probability at 
all the nodes of the tree. (This is explained in various places, 
including in the book 'Computational Molecular Evolution' by Yang.)


On the other hand, if the rate of character evolution for our character 
is relatively low then our marginal posterior probabilities and joint 
likelihood states will likely more-or-less coincide, and these are also 
probably quite similar to our MP states. In that case it would perhaps 
be reasonable to summarize our reconstructed character histories in the 
way you've proposed.


I hope these comments are helpful - and thank you for using phytools!

All the best, Liam

Liam J. Revell, Associate Professor of Biology
University of Massachusetts Boston
web: http://faculty.umb.edu/liam.revell/
email: liam.rev...@umb.edu
blog: http://blog.phytools.org

On 2/24/2017 6:38 PM, marko.djura...@dbe.uns.ac.rs wrote:

Dear r-sig-phylo participants,

I have data set where tips are assigned to 3 discrete states (aquatic,
semi-aquatic and terrestrial). Because the definition of "semi-aquatic"
is quite arbitrary and some species in the tree lack field observations
of their ecology, I decided to use a matrix of state priors for
stochastic mapping in the phytools package. That approach will allow me
to account for uncertainties/lack of information for some tips in the
phylogeny. I fitted 3 models (ER, SYM, and ARD) where Q was empirically
estimated and nsim was set to 1000. According to AIC value, the SYM
model was the best-fitted one. Describe.simmap showed that mean total
time spent in the state "semi-aquatic" was 0. Thus, all mapped trees
were actually binary and I was able to employ the densityMap function to
obtain an object, let's say "obj", which contains a single tree with the
posterior density for the "aquatic" and "terrestrial" states from 1000
stochastic maps.

Here is the question:
Is it straightforward idea to paint back the obj$tree with just two
colors where colors are determined by a threshold value of posterior
probability (indicated as a legend bar in the bottom left part of the
graph)?
For instance, is it appropriate to paint a tree edges with a color A if
PP is lower or equal than 0.5 and color B if PP is greater than 0.5?

Some R packages for model fitting allow simmap tree as input, but if my
question makes sense, it would be better to provide consensus tree from
n stochastic maps instead to use one stochastic map as input.

Thank you for your time.
Kind regards,
Marko

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at
http://www.mail-archive.com/r-sig-phylo@r-project.org/


___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] phytools densityMap tree paint back to binary state

2017-02-24 Thread marko . djurakic

Dear r-sig-phylo participants,

I have data set where tips are assigned to 3 discrete states (aquatic, 
semi-aquatic and terrestrial). Because the definition of "semi-aquatic" 
is quite arbitrary and some species in the tree lack field observations 
of their ecology, I decided to use a matrix of state priors for 
stochastic mapping in the phytools package. That approach will allow me 
to account for uncertainties/lack of information for some tips in the 
phylogeny. I fitted 3 models (ER, SYM, and ARD) where Q was empirically 
estimated and nsim was set to 1000. According to AIC value, the SYM 
model was the best-fitted one. Describe.simmap showed that mean total 
time spent in the state "semi-aquatic" was 0. Thus, all mapped trees 
were actually binary and I was able to employ the densityMap function to 
obtain an object, let's say "obj", which contains a single tree with the 
posterior density for the "aquatic" and "terrestrial" states from 1000 
stochastic maps.


Here is the question:
Is it straightforward idea to paint back the obj$tree with just two 
colors where colors are determined by a threshold value of posterior 
probability (indicated as a legend bar in the bottom left part of the 
graph)?
For instance, is it appropriate to paint a tree edges with a color A if 
PP is lower or equal than 0.5 and color B if PP is greater than 0.5?


Some R packages for model fitting allow simmap tree as input, but if my 
question makes sense, it would be better to provide consensus tree from 
n stochastic maps instead to use one stochastic map as input.


Thank you for your time.
Kind regards,
Marko

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/