Re: [R-sig-phylo] Codifying missing data and polymorphic state of characters in the same matrix

2021-09-04 Thread Liam J. Revell

Dear Felipe.

My suggestion would be to code the polymorphic condition as a third 
state -- so (if your trait is binary) you have three different character 
states: 0, 0+1, and 1.


Then you would enter your data as follows:

states: [0, 0+1, 1]
definitely in state zero: [1.0, 0.0, 0.0]
definitely in state one: [0.0, 0.0, 1.0]
definitely polymorphic: [0.0, 1.0, 0.0]
totally unknown: [0.33, 0.33, 0.33]

You can likewise encode other types of uncertainty about the condition 
of the trait. For instance, if you have observed state 0 in a taxon but 
have relatively little information, then you might want to say that the 
taxon is *either* in state 0 or 0+1. (The converse would obviously true 
for state 1.) This could be encoded as follows.


either state 0 or 0+1: [0.5, 0.5, 0.0]
either state 0+1 or 1: [0.0, 0.5, 0.5]

There's some grey area here too. For instance, how many observations are 
required to conclude that a taxon is "definitely in state zero" or 
"definitely in state one"? Perhaps you've observed only a few 
individuals for a species and all are in state 1. Is that taxon 
definitely monomorphic for the condition, or might it be polymorphic? 
make.simmap can handle this kind of nuance. For instance, you might 
decide the following for a particular taxon.


probably state 1, could be polymorphic: [0.0, 0.25, 0.75]

After you have coded your trait this way, you need to set up the model 
to fit to your data.


To do this you will have to create a design matrix -- the same as is 
used in ape::ace or geiger::fitDiscrete. This design matrix is passed to 
make.simmap as the argument model.


The key attribute of your model design matrix is that transitions from 0 
to 1 occur through the polymorphic condition 0+1.


For instance, for the trait with levels 0, 0+1, 1, your design matrix 
might have the form:


0   0+1 1
0   0   1   0
0+1 2   0   3
1   0   4   0

This is a model in which transitions are allowed to occur 0 <-> 0+1 and 
0+1 <-> 1, and all transitions can have different rates.


Finally, after running your stochastic mapping analysis, you can also 
merge states back together to compute statistics of interest -- such as 
the total posterior probabilities of a state at any node. For instance, 
you could merge the state 0 and 0+1 to get the probability of 0 at a 
node. This can be done using phytools::mergeMappedStates and then 
running the "simmap" summary methods as normal.


I hope this is of some help in getting you started Felipe.

All the best, Liam

Liam J. Revell
University of Massachusetts Boston [Assoc. Prof.]
Universidad Católica de la Ssma Concepción [Adj. Res.]

Web & phytools:
http://faculty.umb.edu/liam.revell/, http://www.phytools.org, 
http://blog.phytools.org


Academic Director UMass Boston Chile Abroad:
https://www.umb.edu/academics/caps/international/biology_chile

U.S. COVID-19 explorer web application:
https://covid19-explorer.org/

On 9/4/2021 7:36 PM, Felipe Rossetto wrote:

EXTERNAL SENDER
Hi everyone!
I am setting up a presence/absence matrix for running  stochastic 
mapping through the make.simmap function of phytools, but containing 
both taxa with polymorphic character state and with unknown data there. 
I know it is possible to represent missing data as probabilities 
instead of presence and absence, but I do not know how to distinguish 
missing data from polymorphic character codes for the analyses.


  So, would this following codification of missing data and polymorphic 
state be correct ?


*Taxon_A,*1,0,0 (character absent)
*Taxon_B*,0,1,0 (character present)
*Taxon_C*,0.5,0.5,0.0 (polymorphic character)
*Taxon_D*,0.33,0.33,0.33 (missing data)

Thank you very much in advance!

Felipe Rossetto



___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Codifying missing data and polymorphic state of characters in the same matrix

2021-09-04 Thread Felipe Rossetto
Hi everyone!
I am setting up a presence/absence matrix for running  stochastic mapping
through the make.simmap function of phytools, but containing both taxa with
polymorphic character state and with unknown data there. I know it is
possible to represent missing data as probabilities instead of presence and
absence, but I do not know how to distinguish missing data from
polymorphic character codes for the analyses.

 So, would this following codification of missing data and polymorphic
state be correct ?

*Taxon_A,*1,0,0 (character absent)
*Taxon_B*,0,1,0 (character present)
*Taxon_C*,0.5,0.5,0.0 (polymorphic character)
*Taxon_D*,0.33,0.33,0.33 (missing data)

Thank you very much in advance!

Felipe Rossetto

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/