Re: [R-sig-phylo] R Dendrograms for Subclones in a Somatic Cell Population: Time Interval Sampling.

Brian O'Meara Thu, 28 Feb 2013 17:20:30 -0800

An old parsimony-based approach to this is known as stratocladistics. There
are no R implementations, as far as I know, but you could wrap phangorn to
do this, I imagine, though it does require writing a new function.
Pseudocode:

#my.taxa.vector is character vector of tips
best.phy <- rtree(length(my.taxa.vector), tip.label=my.taxa.vector)
best.trees <- c(best.phy)
best.score <- parsimony(best.phy, data) + strato(best.phy, times)
for (i in sequence(nsteps)) {
  new.phy <- rSPR(best.trees[[1]])
  new.score <- pars(new.phy, data) + strato(new.phy, times)
  if (new.score == best.score) {
    best.trees <- c((new.phy), best.trees)
  }
  if (new.score < best.score) {
    best.trees <- c(new.phy)
    best.score <- new.score
  }
}

this would do a greedy spr search: you'd want to restart from different
trees and such. The only tricky thing is figuring out the new strato
function: try a set of branch lengths and take as the best score the one
that implies the least amount of stratographic debt (ape's node.height
function could be useful for this; one thing that makes these easy is that
times and node heights must be integers (number of time points from root),
so even searching exhaustively is feasible, though almost
surely unnecessary). Some of these branch lengths could be zero,
indicating, in the case of a terminal branch length, a sample that is a
direct ancestor of another sample. The tree isn't quite rooted but it is
polarized, with nodes sampled further back in time pushed down the tree.
Writing the strato function really wouldn't be that bad to do.

Another approach is to do a likelihood search assuming a clock, but with
tips constrained to occur at time of sampling rather than being coeval.
Heibl and Cusimano's Lagopus package (not on CRAN, go to
http://www.christophheibl.de/mdt/mdtinr.html) calls PAML and multidivtime
to estimate a tree with age constraints. Multidivtime can use constraints
such that the tips are not coeval, but I'm not certain that Lagopus can
pass this information (it can certainly do node constraints, just not sure
about tip constraints). If it can, or could be modified to do so, this
would give you a tree with samples constrained to be at the right times and
with possibly zero length branches for truly ancestral samples. You could
then collapse these using di2multi in ape.

Someone else may know of other ways to attack this problem.

Hope this helps,
Brian

_______________________________________
Brian O'Meara
Assistant Professor
Dept. of Ecology & Evolutionary Biology
U. of Tennessee, Knoxville
http://www.brianomeara.info

Students wanted: Applications due Dec. 15, annually
Postdoc collaborators wanted: Check NIMBioS' website
Calendar: http://www.brianomeara.info/calendars/omeara

On Thu, Feb 28, 2013 at 3:21 PM, <sa...@math.berkeley.edu> wrote:

> Estimating Ongoing Evolution by Repeated Sampling with "Long" Time
> Intervals.
>
> Is there a way to construct dendrograms similar to those used in
> phylogenetics but
> with 2 main differences:
>      (1) Instead of observing at one time, small samples from a very large
> population are taken at regular intervals, so that some observed cells
> could
> easily correspond to an internal node rather than a leaf.
>      (2) There is no obvious outgroup; root should if possible be
> estimated by
> presuming that observations from later time points are on average farther
> from
> root.
>
> More specifically, consider a large, heterogeneous, unstably evolving in
> vitro cell
> culture apparently not subject to a Hayflick limit. In our feasibility
> study, a
> sample of 20 cells were tested at t=0 for about 100 different numerical
> aspects of
> their karyotype (for each cell an ordered vector of 100 numbers is
> measured from
> the genome; the individual numbers all have the same order of magnitude).
>
> About 15 cell generations later the observation is repeated and similarly
> four more
> times for a total of 120 cells over a time span of about 60 cell
> generations. I
> would like to estimate the behavior of the major subclones  Are some
> spinning off
> new karyotypes? Which ones, if any, are in the process of taking over? Are
> some
> being outcompeted? And so on.
>
> Various difference matrices and binary dendrograms with the cells as
> leaves are
> easily constructed and are suggestive. For example at timepoint 5 one
> karyotype
> which was prominent, with lots of duplicates, for timepoints 1-4
> disappears from
> the samples. But the dendrograms themselves dont really use the fact that
> observations were made at six consective times rather than simultaneously;
> and they
> require me to make a guess about where root is. There must be a better way
> to use
> the data. I assume people who work, say, on development of drug-resistant
> bacterial
> lineages have thought this through in some detail and developed R software
> for it
> but I wasnt able to locate anything.
>
> Thanks in Advance, Ray Sachs, Dept. Math, UCB
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] R Dendrograms for Subclones in a Somatic Cell Population: Time Interval Sampling.

Reply via email to