Re: [R-sig-phylo] problem with drop.tip

2013-02-28 Thread Klaus Schliep
Dear John,

can you please be a bit more specific with your error message. It is
always good to have a reproducible example, e.g. adding a tree which
where drop.tip fails, and to run traceback() just after the error to
get more information where the error occurred. It is also useful to
add information of the version of ape and your operating system.

Regards,
Klaus




On 2/28/13, john d dobzhan...@gmail.com wrote:
 Dear all,

 I'm trying to prune a set of 1000 post-burnin trees to include only a
 subset of taxa. Unfortunately the tree is too big to send to the list,
 but if it is really necessary I'll figure out a way to do it.

 tr is my tree and taxa is my list of selected terminals.

 for(i in 1:1000){
   write.tree(drop.tip(tr[[i]],tr[[i]]$tip.label[-match(taxa,
 tr[[i]]$tip.label)]), file=result.tre, append=TRUE)
 }

 If I run that code, it works for some trees, but not for others, for
 which I got the message Error in kids[[parent[i]]] : subscript out of
 bounds.

 Any suggestions?

 John

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/



-- 
Klaus Schliep
Phylogenomics Lab at the University of Vigo, Spain

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] problem with drop.tip

2013-02-28 Thread Klaus Schliep
Hi John,
It seems a problem occurs within write.tree and not with the pruning.
So try prune the trees first and then write them out.

tr2=vector(list, 1000)
for(i in 1:1000){
  tr2[[i]] - drop.tip(tr[[i]],tr[[i]]$tip.label[-match(taxa,
tr[[i]]$tip.label)])
}
class(tr2) = multiPhylo
plot(tr2)
for(i in 1:1000){print(i);write.tree(tr2[[i]])} # may helps find you
the trees which fail
write.tree(tr2, file=result.tre)

Cheers,
Klaus


On 2/28/13, Klaus Schliep klaus.schl...@gmail.com wrote:
 Dear John,

 can you please be a bit more specific with your error message. It is
 always good to have a reproducible example, e.g. adding a tree which
 where drop.tip fails, and to run traceback() just after the error to
 get more information where the error occurred. It is also useful to
 add information of the version of ape and your operating system.

 Regards,
 Klaus




 On 2/28/13, john d dobzhan...@gmail.com wrote:
 Dear all,

 I'm trying to prune a set of 1000 post-burnin trees to include only a
 subset of taxa. Unfortunately the tree is too big to send to the list,
 but if it is really necessary I'll figure out a way to do it.

 tr is my tree and taxa is my list of selected terminals.

 for(i in 1:1000){
   write.tree(drop.tip(tr[[i]],tr[[i]]$tip.label[-match(taxa,
 tr[[i]]$tip.label)]), file=result.tre, append=TRUE)
 }

 If I run that code, it works for some trees, but not for others, for
 which I got the message Error in kids[[parent[i]]] : subscript out of
 bounds.

 Any suggestions?

 John

 ___
 R-sig-phylo mailing list - R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 Searchable archive at
 http://www.mail-archive.com/r-sig-phylo@r-project.org/



 --
 Klaus Schliep
 Phylogenomics Lab at the University of Vigo, Spain



-- 
Klaus Schliep
Phylogenomics Lab at the University of Vigo, Spain

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] Fwd: Deadline extended: Unifying paleobiological and comparative perspectives on character evolution, Lisbon 2013 (ESEB)

2013-02-28 Thread David Bapst
-- Forwarded message --
From: Lee Hsiang Liow l.h.l...@ibv.uio.no
Date: Thu, Feb 28, 2013 at 2:23 AM
Subject: Deadline extended: Unifying paleobiological and comparative
perspectives on character evolution, Lisbon 2013 (ESEB)

Dear Colleagues,

We would like to invite you to send abstracts to our symposium on
Unifying paleobiological and comparative perspectives on character
evolution for the 14th ESEB congress in Lisbon taking place 19-24th
of August 2013.

Link: https://www.eseb2013.com/symposia

Deadline for submission has been extended to 8 March 2013.

Organizers: Lee Hsiang Liow  Thomas F. Hansen. University of Oslo,
Department of Biology, CEES.

Invited speakers: Gene Hunt and Folmer Bokma.

Summary: It is no longer debated that the fossil record is necessary
to inform us about the history of life, yet the integration of data
and perspectives using fossils and comparative data in understanding
evolution is far from mature.  This symposium gathers researchers
straddling the realms of the extinct and the extant to explore how we
can better understand evolutionary processes especially on time scales
common to palaeobiological and phylogenetic comparative studies, using
character evolution as a focal point.

Sincerely,
Lee Hsiang Liow  Thomas F. Hansen

--
David Bapst
Dept of Geophysical Sciences
University of Chicago
5734 S. Ellis
Chicago, IL 60637
http://home.uchicago.edu/~dwbapst/
http://cran.r-project.org/web/packages/paleotree/index.html

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


[R-sig-phylo] R Dendrograms for Subclones in a Somatic Cell Population: Time Interval Sampling.

2013-02-28 Thread sachs
Estimating Ongoing Evolution by Repeated Sampling with Long Time Intervals.

Is there a way to construct dendrograms similar to those used in phylogenetics 
but
with 2 main differences:
 (1) Instead of observing at one time, small samples from a very large
population are taken at regular intervals, so that some observed cells could
easily correspond to an internal node rather than a leaf.
 (2) There is no obvious outgroup; root should if possible be estimated by
presuming that observations from later time points are on average farther from
root.

More specifically, consider a large, heterogeneous, unstably evolving in vitro 
cell
culture apparently not subject to a Hayflick limit. In our feasibility study, a
sample of 20 cells were tested at t=0 for about 100 different numerical aspects 
of
their karyotype (for each cell an ordered vector of 100 numbers is measured from
the genome; the individual numbers all have the same order of magnitude).

About 15 cell generations later the observation is repeated and similarly four 
more
times for a total of 120 cells over a time span of about 60 cell generations. I
would like to estimate the behavior of the major subclones – Are some spinning 
off
new karyotypes? Which ones, if any, are in the process of taking over? Are some
being outcompeted? And so on.

Various difference matrices and binary dendrograms with the cells as leaves are
easily constructed and are suggestive. For example at timepoint 5 one karyotype
which was prominent, with lots of duplicates, for timepoints 1-4 disappears from
the samples. But the dendrograms themselves don’t really use the fact that
observations were made at six consective times rather than simultaneously; and 
they
require me to make a guess about where root is. There must be a better way to 
use
the data. I assume people who work, say, on development of drug-resistant 
bacterial
lineages have thought this through in some detail and developed R software for 
it
but I wasn’t able to locate anything.

Thanks in Advance, Ray Sachs, Dept. Math, UCB

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] R Dendrograms for Subclones in a Somatic Cell Population: Time Interval Sampling.

2013-02-28 Thread Brian O'Meara
An old parsimony-based approach to this is known as stratocladistics. There
are no R implementations, as far as I know, but you could wrap phangorn to
do this, I imagine, though it does require writing a new function.
Pseudocode:

#my.taxa.vector is character vector of tips
best.phy - rtree(length(my.taxa.vector), tip.label=my.taxa.vector)
best.trees - c(best.phy)
best.score - parsimony(best.phy, data) + strato(best.phy, times)
for (i in sequence(nsteps)) {
  new.phy - rSPR(best.trees[[1]])
  new.score - pars(new.phy, data) + strato(new.phy, times)
  if (new.score == best.score) {
best.trees - c((new.phy), best.trees)
  }
  if (new.score  best.score) {
best.trees - c(new.phy)
best.score - new.score
  }
}

this would do a greedy spr search: you'd want to restart from different
trees and such. The only tricky thing is figuring out the new strato
function: try a set of branch lengths and take as the best score the one
that implies the least amount of stratographic debt (ape's node.height
function could be useful for this; one thing that makes these easy is that
times and node heights must be integers (number of time points from root),
so even searching exhaustively is feasible, though almost
surely unnecessary). Some of these branch lengths could be zero,
indicating, in the case of a terminal branch length, a sample that is a
direct ancestor of another sample. The tree isn't quite rooted but it is
polarized, with nodes sampled further back in time pushed down the tree.
Writing the strato function really wouldn't be that bad to do.

Another approach is to do a likelihood search assuming a clock, but with
tips constrained to occur at time of sampling rather than being coeval.
Heibl and Cusimano's Lagopus package (not on CRAN, go to
http://www.christophheibl.de/mdt/mdtinr.html) calls PAML and multidivtime
to estimate a tree with age constraints. Multidivtime can use constraints
such that the tips are not coeval, but I'm not certain that Lagopus can
pass this information (it can certainly do node constraints, just not sure
about tip constraints). If it can, or could be modified to do so, this
would give you a tree with samples constrained to be at the right times and
with possibly zero length branches for truly ancestral samples. You could
then collapse these using di2multi in ape.

Someone else may know of other ways to attack this problem.

Hope this helps,
Brian

___
Brian O'Meara
Assistant Professor
Dept. of Ecology  Evolutionary Biology
U. of Tennessee, Knoxville
http://www.brianomeara.info

Students wanted: Applications due Dec. 15, annually
Postdoc collaborators wanted: Check NIMBioS' website
Calendar: http://www.brianomeara.info/calendars/omeara


On Thu, Feb 28, 2013 at 3:21 PM, sa...@math.berkeley.edu wrote:

 Estimating Ongoing Evolution by Repeated Sampling with Long Time
 Intervals.

 Is there a way to construct dendrograms similar to those used in
 phylogenetics but
 with 2 main differences:
  (1) Instead of observing at one time, small samples from a very large
 population are taken at regular intervals, so that some observed cells
 could
 easily correspond to an internal node rather than a leaf.
  (2) There is no obvious outgroup; root should if possible be
 estimated by
 presuming that observations from later time points are on average farther
 from
 root.

 More specifically, consider a large, heterogeneous, unstably evolving in
 vitro cell
 culture apparently not subject to a Hayflick limit. In our feasibility
 study, a
 sample of 20 cells were tested at t=0 for about 100 different numerical
 aspects of
 their karyotype (for each cell an ordered vector of 100 numbers is
 measured from
 the genome; the individual numbers all have the same order of magnitude).

 About 15 cell generations later the observation is repeated and similarly
 four more
 times for a total of 120 cells over a time span of about 60 cell
 generations. I
 would like to estimate the behavior of the major subclones – Are some
 spinning off
 new karyotypes? Which ones, if any, are in the process of taking over? Are
 some
 being outcompeted? And so on.

 Various difference matrices and binary dendrograms with the cells as
 leaves are
 easily constructed and are suggestive. For example at timepoint 5 one
 karyotype
 which was prominent, with lots of duplicates, for timepoints 1-4
 disappears from
 the samples. But the dendrograms themselves don’t really use the fact that
 observations were made at six consective times rather than simultaneously;
 and they
 require me to make a guess about where root is. There must be a better way
 to use
 the data. I assume people who work, say, on development of drug-resistant
 bacterial
 lineages have thought this through in some detail and developed R software
 for it
 but I wasn’t able to locate anything.

 Thanks in Advance, Ray Sachs, Dept. Math, UCB

 ___
 R-sig-phylo mailing list -