Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-22 Thread Emmanuel Paradis

Hi Scott,

The reason for implementing only the consensus on the topology was after 
reading the appropriate chapter in Inferring Phylogenies. So I'm glad 
that Joe himself stepped in. I have added this reference in the help 
page (it was already cited in my book with respect to this issue).


I think that the usage of branch lengths on (or from) a consensus tree 
depends on your perspective on phylogenetic estimation.


If you are a frequentist, the goal of resampling the data is to give an 
idea of the accuracy of the estimates. So you might not use the 
bootstrap trees to estimate branch lengths; you already did that by ML 
(or maybe least squares) with the full data.


If you are Bayesian, the trees sampled from an MCMC are here for 
estimation including of the branch lengths, so you use them to compute 
some sort of consensus topology as well as its branch lengths. So it 
makes sense that MrBayes can do a consensus tree with branch lengths.


Cheers,

Emmanuel

Scott Chamberlain wrote on 22/11/2012 09:41:

Dear Joe,

Thanks for your feedback on this question.  I will go read those pages you
mentioned.

Scott


On Wed, Nov 21, 2012 at 5:19 PM, Joe Felsensteinj...@gs.washington.eduwrote:



Daniel Barker wrote:

What should branch lengths on a consensus tree represent?

Scott Chamberlain had written:


When making a consensus tree using ape::consensus the branch lengths are

lost. Is there a way to not lose the branch lengths? Or to add them

somehow

to the consensus tree after making it.


The issue of what branch lengths ought to be on a consensus tree is not
simple.  If we have three rooted trees:
((A:1,B:1):1,C:2);
((A:1,B:1):1,C:2);
(A:2,(B:1,C:1):1);

the consensus tree should be the first tree, but what branch length should
be used for (say) the branch ancestral to the AB clade?  1?   0.667?

The minute you open this can of worms it becomes clear that the answer
depends on what you want that number to convey and what interpretations
your audience will tend to draw from the number.  There is no obvious
answer. So this is not a mere technical computing question.

By the way, in my 2004 book, you will find me agonizing about this on page
526, coming down on the side of 0.667, but not overwhelmingly convincingly.
  You could argue that a branch length should be set 0 when the branch is
not there, and all the resulting values averaged, or you could argue that
the average should only be taken over those trees for which that branch is
present.

One possible way to solve the problem is to take the consensus tree as if
it were a user-defined tree, use your whole data set, and infer branch
lengths on that tree.  Daniel has already expressed his legitimate concerns
in such a case as to whether it takes (for example) trifurcations as if
they were real rather than an expression of our uncertainty.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans
Hall, Berkeley, CA  94710






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-22 Thread Joe Felsenstein

Emmanuel Paradis wrote:

 If you are Bayesian, the trees sampled from an MCMC are here for estimation 
 including of the branch lengths, so you use them to compute some sort of 
 consensus topology as well as its branch lengths. So it makes sense that 
 MrBayes can do a consensus tree with branch lengths.


I endorse the rest of Emmanuel's advice but let me quibble with this one.  The 
posterior on trees may not consist mostly of trees varying around a single 
consensus.  If the posterior had, for example, two modes, each centered around 
a different tree, a single consensus tree might not be appropriate, and branch 
lengths computed by averaging lengths over the two modes might not be a good 
guide to what the trees in the posterior looked like.  I don't know enough 
about MrBayes features to know whether they have some way around this.

There is a similar issue with parsimony methods -- the set of most parsimonious 
trees may have a consensus, which may well not be a most parsimonious tree. 
People who see the consensus of most parsimonious trees may not realize that 
the particular tree they are looking at is not most parsimonious.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans Hall, 
Berkeley, CA  94710

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-21 Thread Daniel Barker
Dear Scott,

What should branch lengths on a consensus tree represent?

They cannot be expected substitutions per residue. This would imply no
evolution at points where uncertain branching patterns have been reduced
to a multi-furcation - which is not what the multi-furcation is meant to
imply. (Rather: there was evolution, but we aren't very certain about the
branching pattern.)

But, MrBayes does provide average lengths of some kind.

Best wishes,

Daniel

On 21/11/2012 15:13, Scott Chamberlain myrmecocys...@gmail.com wrote:

When making a consensus tree using ape::consensus the branch lengths are
lost. Is there a way to not lose the branch lengths? Or to add them
somehow
to the consensus tree after making it.

library(ape)
 
cat(owls(((Strix_aluco:4.2,Asio_otus:4.1):4.1,Athene_noctua:7.3):6.3,Tyt
o_alba:13.5);, file = ex1.tre, sep = \n)
cat(owls(((Strix_aluco:1.2,Asio_otus:4.5):3.1,Athene_noctua:7.3):6.3,Tyt
o_alba:13.5);, file = ex2.tre, sep = \n)
cat(owls(((Strix_aluco:3.2,Asio_otus:4.7):8.1,Athene_noctua:7.3):6.3,Tyt
o_alba:13.5);, file = ex3.tre, sep = \n) tree1 -
read.tree(ex1.tre) tree2 - read.tree(ex2.tre) tree3 -
read.tree(ex3.tre) trees - c(tree1, tree2, tree3) trees_con -
consensus(trees) trees_con
Phylogenetic tree with 4 tips and 3 internal nodes.
Tip labels:[1] Strix_aluco   Asio_otus Athene_noctua Tyto_alba
Rooted; no branch lengths.


Thanks, Scott Chamberlain

   [[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



-- 
Daniel Barker
http://bio.st-andrews.ac.uk/staff/db60.htm
The University of St Andrews is a charity registered in Scotland : No
SC013532

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-21 Thread Brian O'Meara
I have a function to create a consensus tree with branch lengths. You feed
it a given topology (often a consensus topology, made with ape), then a
list of trees, and tell it what you want the branch lengths to represent.
It could be the proportion of input trees with that edge (good for
summarizing bootstrap or Bayes proportions) or the mean, median, or sd of
branch lengths for those trees that have that edge. Consensus branch
lengths in units of proportion of matching trees has obvious utility. As
Daniel says, the average branch lengths across a set of trees is more
difficult to see a use case for, but you could imagine doing something like
taking the ratogram output from r8s on a set of trees and summarizing the
rate average and rate sd on a given, best, tree as two sets of branch
lengths on that tree.

I've put the function source at
https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrlen.R?revision=110root=omearalab.
 You can source the file for the function (consensusBrlen() ) and
other
functions it needs. It also uses phylobase. Note that this is alpha-quality
code -- it's been checked a bit, but verify it's doing what you want.

Here's an example of how to use it

 library(ape)

library(phylobase)

phy.a-rcoal(15)

phy.b-phy.a

phy.b$edge.length-phy.b$edge.length+runif(length(phy.b$edge.length), 0,
0.1)

phy.c-rcoal(15)

phy.list-list(phy.a, phy.b, phy.c)

phy.consensus-consensusBrlen(phy.a, list(phy.a, phy.b, phy.c),
type=mean_brlen)


Best,
Brian


PS: Note that I am actively looking for grad students: info at
http://www.brianomeara.info/lab . Guaranteed five years support, subject to
decent performance.

___
Brian O'Meara
Assistant Professor
Dept. of Ecology  Evolutionary Biology
U. of Tennessee, Knoxville
http://www.brianomeara.info

Students wanted: Applications due Dec. 15, annually
Postdoc collaborators wanted: Check NIMBioS' website
Calendar: http://www.brianomeara.info/calendars/omeara


On Wed, Nov 21, 2012 at 11:09 AM, Daniel Barker d...@st-andrews.ac.ukwrote:

 Dear Scott,

 What should branch lengths on a consensus tree represent?

 They cannot be expected substitutions per residue. This would imply no
 evolution at points where uncertain branching patterns have been reduced
 to a multi-furcation - which is not what the multi-furcation is meant to
 imply. (Rather: there was evolution, but we aren't very certain about the
 branching pattern.)

 But, MrBayes does provide average lengths of some kind.

 Best wishes,

 Daniel

 On 21/11/2012 15:13, Scott Chamberlain myrmecocys...@gmail.com wrote:

 When making a consensus tree using ape::consensus the branch lengths are
 lost. Is there a way to not lose the branch lengths? Or to add them
 somehow
 to the consensus tree after making it.
 
 library(ape)
 
 cat(owls(((Strix_aluco:4.2,Asio_otus:4.1):4.1,Athene_noctua:7.3):6.3,Tyt
 o_alba:13.5);, file = ex1.tre, sep = \n)
 cat(owls(((Strix_aluco:1.2,Asio_otus:4.5):3.1,Athene_noctua:7.3):6.3,Tyt
 o_alba:13.5);, file = ex2.tre, sep = \n)
 cat(owls(((Strix_aluco:3.2,Asio_otus:4.7):8.1,Athene_noctua:7.3):6.3,Tyt
 o_alba:13.5);, file = ex3.tre, sep = \n) tree1 -
 read.tree(ex1.tre) tree2 - read.tree(ex2.tre) tree3 -
 read.tree(ex3.tre) trees - c(tree1, tree2, tree3) trees_con -
 consensus(trees) trees_con
 Phylogenetic tree with 4 tips and 3 internal nodes.
 Tip labels:[1] Strix_aluco   Asio_otus Athene_noctua Tyto_alba
 Rooted; no branch lengths.
 
 
 Thanks, Scott Chamberlain
 
[[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 


 --
 Daniel Barker
 http://bio.st-andrews.ac.uk/staff/db60.htm
 The University of St Andrews is a charity registered in Scotland : No
 SC013532

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-21 Thread Scott Chamberlain
Dear Brian,

Awesome. Thanks for sharing the code Brian. I will give it a try. I see
what you all mean now more precisely with the question of what does it
really mean to have branch lengths on a consensus tree.

Thanks, Scott


On Wed, Nov 21, 2012 at 11:10 AM, Brian O'Meara bome...@utk.edu wrote:


 I have a function to create a consensus tree with branch lengths. You feed
 it a given topology (often a consensus topology, made with ape), then a
 list of trees, and tell it what you want the branch lengths to represent.
 It could be the proportion of input trees with that edge (good for
 summarizing bootstrap or Bayes proportions) or the mean, median, or sd of
 branch lengths for those trees that have that edge. Consensus branch
 lengths in units of proportion of matching trees has obvious utility. As
 Daniel says, the average branch lengths across a set of trees is more
 difficult to see a use case for, but you could imagine doing something like
 taking the ratogram output from r8s on a set of trees and summarizing the
 rate average and rate sd on a given, best, tree as two sets of branch
 lengths on that tree.

 I've put the function source at
 https://r-forge.r-project.org/scm/viewvc.php/*checkout*/pkg/R/consensusBrlen.R?revision=110root=omearalab.
   You can source the file for the function (consensusBrlen() ) and other
 functions it needs. It also uses phylobase. Note that this is alpha-quality
 code -- it's been checked a bit, but verify it's doing what you want.

 Here's an example of how to use it

  library(ape)

 library(phylobase)

 phy.a-rcoal(15)

 phy.b-phy.a

 phy.b$edge.length-phy.b$edge.length+runif(length(phy.b$edge.length), 0,
 0.1)

 phy.c-rcoal(15)

 phy.list-list(phy.a, phy.b, phy.c)

 phy.consensus-consensusBrlen(phy.a, list(phy.a, phy.b, phy.c),
 type=mean_brlen)


 Best,
 Brian


 PS: Note that I am actively looking for grad students: info at
 http://www.brianomeara.info/lab . Guaranteed five years support, subject
 to decent performance.

 ___
 Brian O'Meara
 Assistant Professor
 Dept. of Ecology  Evolutionary Biology
 U. of Tennessee, Knoxville
 http://www.brianomeara.info

 Students wanted: Applications due Dec. 15, annually
 Postdoc collaborators wanted: Check NIMBioS' website
 Calendar: http://www.brianomeara.info/calendars/omeara



 On Wed, Nov 21, 2012 at 11:09 AM, Daniel Barker d...@st-andrews.ac.ukwrote:

 Dear Scott,

 What should branch lengths on a consensus tree represent?

 They cannot be expected substitutions per residue. This would imply no
 evolution at points where uncertain branching patterns have been reduced
 to a multi-furcation - which is not what the multi-furcation is meant to
 imply. (Rather: there was evolution, but we aren't very certain about the
 branching pattern.)

 But, MrBayes does provide average lengths of some kind.

 Best wishes,

 Daniel

 On 21/11/2012 15:13, Scott Chamberlain myrmecocys...@gmail.com wrote:

 When making a consensus tree using ape::consensus the branch lengths are
 lost. Is there a way to not lose the branch lengths? Or to add them
 somehow
 to the consensus tree after making it.
 
 library(ape)
 

 cat(owls(((Strix_aluco:4.2,Asio_otus:4.1):4.1,Athene_noctua:7.3):6.3,Tyt
 o_alba:13.5);, file = ex1.tre, sep = \n)

 cat(owls(((Strix_aluco:1.2,Asio_otus:4.5):3.1,Athene_noctua:7.3):6.3,Tyt
 o_alba:13.5);, file = ex2.tre, sep = \n)

 cat(owls(((Strix_aluco:3.2,Asio_otus:4.7):8.1,Athene_noctua:7.3):6.3,Tyt
 o_alba:13.5);, file = ex3.tre, sep = \n) tree1 -
 read.tree(ex1.tre) tree2 - read.tree(ex2.tre) tree3 -
 read.tree(ex3.tre) trees - c(tree1, tree2, tree3) trees_con -
 consensus(trees) trees_con
 Phylogenetic tree with 4 tips and 3 internal nodes.
 Tip labels:[1] Strix_aluco   Asio_otus Athene_noctua
 Tyto_alba
 Rooted; no branch lengths.
 
 
 Thanks, Scott Chamberlain
 
[[alternative HTML version deleted]]
 
 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
 


 --
 Daniel Barker
 http://bio.st-andrews.ac.uk/staff/db60.htm
 The University of St Andrews is a charity registered in Scotland : No
 SC013532

 ___
 R-sig-phylo mailing list
 R-sig-phylo@r-project.org
 https://stat.ethz.ch/mailman/listinfo/r-sig-phylo




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-21 Thread Scott Chamberlain
Dear Joe,

Thanks for your feedback on this question.  I will go read those pages you
mentioned.

Scott


On Wed, Nov 21, 2012 at 5:19 PM, Joe Felsenstein j...@gs.washington.eduwrote:


 Daniel Barker wrote:

 What should branch lengths on a consensus tree represent?

 Scott Chamberlain had written:


 When making a consensus tree using ape::consensus the branch lengths are

 lost. Is there a way to not lose the branch lengths? Or to add them

 somehow

 to the consensus tree after making it.


 The issue of what branch lengths ought to be on a consensus tree is not
 simple.  If we have three rooted trees:
 ((A:1,B:1):1,C:2);
 ((A:1,B:1):1,C:2);
 (A:2,(B:1,C:1):1);

 the consensus tree should be the first tree, but what branch length should
 be used for (say) the branch ancestral to the AB clade?  1?   0.667?

 The minute you open this can of worms it becomes clear that the answer
 depends on what you want that number to convey and what interpretations
 your audience will tend to draw from the number.  There is no obvious
 answer. So this is not a mere technical computing question.

 By the way, in my 2004 book, you will find me agonizing about this on page
 526, coming down on the side of 0.667, but not overwhelmingly convincingly.
  You could argue that a branch length should be set 0 when the branch is
 not there, and all the resulting values averaged, or you could argue that
 the average should only be taken over those trees for which that branch is
 present.

 One possible way to solve the problem is to take the consensus tree as if
 it were a user-defined tree, use your whole data set, and infer branch
 lengths on that tree.  Daniel has already expressed his legitimate concerns
 in such a case as to whether it takes (for example) trifurcations as if
 they were real rather than an expression of our uncertainty.

 J.F.
 
 Joe Felsenstein j...@gs.washington.edu
 Department of Genome Sciences and Department of Biology,
 University of Washington, Box 355065, Seattle, WA 98195-5065 USA

 (from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
 Department of Statistics, University of California, Berkeley, 367 Evans
 Hall, Berkeley, CA  94710





[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo