An old parsimony-based approach to this is known as stratocladistics. There are no R implementations, as far as I know, but you could wrap phangorn to do this, I imagine, though it does require writing a new function. Pseudocode:
#my.taxa.vector is character vector of tips best.phy <- rtree(length(my.taxa.vector), tip.label=my.taxa.vector) best.trees <- c(best.phy) best.score <- parsimony(best.phy, data) + strato(best.phy, times) for (i in sequence(nsteps)) { new.phy <- rSPR(best.trees[[1]]) new.score <- pars(new.phy, data) + strato(new.phy, times) if (new.score == best.score) { best.trees <- c((new.phy), best.trees) } if (new.score < best.score) { best.trees <- c(new.phy) best.score <- new.score } } this would do a greedy spr search: you'd want to restart from different trees and such. The only tricky thing is figuring out the new strato function: try a set of branch lengths and take as the best score the one that implies the least amount of stratographic debt (ape's node.height function could be useful for this; one thing that makes these easy is that times and node heights must be integers (number of time points from root), so even searching exhaustively is feasible, though almost surely unnecessary). Some of these branch lengths could be zero, indicating, in the case of a terminal branch length, a sample that is a direct ancestor of another sample. The tree isn't quite rooted but it is polarized, with nodes sampled further back in time pushed down the tree. Writing the strato function really wouldn't be that bad to do. Another approach is to do a likelihood search assuming a clock, but with tips constrained to occur at time of sampling rather than being coeval. Heibl and Cusimano's Lagopus package (not on CRAN, go to http://www.christophheibl.de/mdt/mdtinr.html) calls PAML and multidivtime to estimate a tree with age constraints. Multidivtime can use constraints such that the tips are not coeval, but I'm not certain that Lagopus can pass this information (it can certainly do node constraints, just not sure about tip constraints). If it can, or could be modified to do so, this would give you a tree with samples constrained to be at the right times and with possibly zero length branches for truly ancestral samples. You could then collapse these using di2multi in ape. Someone else may know of other ways to attack this problem. Hope this helps, Brian _______________________________________ Brian O'Meara Assistant Professor Dept. of Ecology & Evolutionary Biology U. of Tennessee, Knoxville http://www.brianomeara.info Students wanted: Applications due Dec. 15, annually Postdoc collaborators wanted: Check NIMBioS' website Calendar: http://www.brianomeara.info/calendars/omeara On Thu, Feb 28, 2013 at 3:21 PM, <sa...@math.berkeley.edu> wrote: > Estimating Ongoing Evolution by Repeated Sampling with "Long" Time > Intervals. > > Is there a way to construct dendrograms similar to those used in > phylogenetics but > with 2 main differences: > (1) Instead of observing at one time, small samples from a very large > population are taken at regular intervals, so that some observed cells > could > easily correspond to an internal node rather than a leaf. > (2) There is no obvious outgroup; root should if possible be > estimated by > presuming that observations from later time points are on average farther > from > root. > > More specifically, consider a large, heterogeneous, unstably evolving in > vitro cell > culture apparently not subject to a Hayflick limit. In our feasibility > study, a > sample of 20 cells were tested at t=0 for about 100 different numerical > aspects of > their karyotype (for each cell an ordered vector of 100 numbers is > measured from > the genome; the individual numbers all have the same order of magnitude). > > About 15 cell generations later the observation is repeated and similarly > four more > times for a total of 120 cells over a time span of about 60 cell > generations. I > would like to estimate the behavior of the major subclones Are some > spinning off > new karyotypes? Which ones, if any, are in the process of taking over? Are > some > being outcompeted? And so on. > > Various difference matrices and binary dendrograms with the cells as > leaves are > easily constructed and are suggestive. For example at timepoint 5 one > karyotype > which was prominent, with lots of duplicates, for timepoints 1-4 > disappears from > the samples. But the dendrograms themselves dont really use the fact that > observations were made at six consective times rather than simultaneously; > and they > require me to make a guess about where root is. There must be a better way > to use > the data. I assume people who work, say, on development of drug-resistant > bacterial > lineages have thought this through in some detail and developed R software > for it > but I wasnt able to locate anything. > > Thanks in Advance, Ray Sachs, Dept. Math, UCB > > _______________________________________________ > R-sig-phylo mailing list - R-sig-phylo@r-project.org > https://stat.ethz.ch/mailman/listinfo/r-sig-phylo > Searchable archive at > http://www.mail-archive.com/r-sig-phylo@r-project.org/ [[alternative HTML version deleted]]
_______________________________________________ R-sig-phylo mailing list - R-sig-phylo@r-project.org https://stat.ethz.ch/mailman/listinfo/r-sig-phylo Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/