Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-22 Thread Emmanuel Paradis

Hi Scott,

The reason for implementing only the consensus on the topology was after 
reading the appropriate chapter in Inferring Phylogenies. So I'm glad 
that Joe himself stepped in. I have added this reference in the help 
page (it was already cited in my book with respect to this issue).


I think that the usage of branch lengths on (or from) a consensus tree 
depends on your perspective on phylogenetic estimation.


If you are a frequentist, the goal of resampling the data is to give an 
idea of the accuracy of the estimates. So you might not use the 
bootstrap trees to estimate branch lengths; you already did that by ML 
(or maybe least squares) with the full data.


If you are Bayesian, the trees sampled from an MCMC are here for 
estimation including of the branch lengths, so you use them to compute 
some sort of consensus topology as well as its branch lengths. So it 
makes sense that MrBayes can do a consensus tree with branch lengths.


Cheers,

Emmanuel

Scott Chamberlain wrote on 22/11/2012 09:41:

Dear Joe,

Thanks for your feedback on this question.  I will go read those pages you
mentioned.

Scott


On Wed, Nov 21, 2012 at 5:19 PM, Joe Felsensteinj...@gs.washington.eduwrote:



Daniel Barker wrote:

What should branch lengths on a consensus tree represent?

Scott Chamberlain had written:


When making a consensus tree using ape::consensus the branch lengths are

lost. Is there a way to not lose the branch lengths? Or to add them

somehow

to the consensus tree after making it.


The issue of what branch lengths ought to be on a consensus tree is not
simple.  If we have three rooted trees:
((A:1,B:1):1,C:2);
((A:1,B:1):1,C:2);
(A:2,(B:1,C:1):1);

the consensus tree should be the first tree, but what branch length should
be used for (say) the branch ancestral to the AB clade?  1?   0.667?

The minute you open this can of worms it becomes clear that the answer
depends on what you want that number to convey and what interpretations
your audience will tend to draw from the number.  There is no obvious
answer. So this is not a mere technical computing question.

By the way, in my 2004 book, you will find me agonizing about this on page
526, coming down on the side of 0.667, but not overwhelmingly convincingly.
  You could argue that a branch length should be set 0 when the branch is
not there, and all the resulting values averaged, or you could argue that
the average should only be taken over those trees for which that branch is
present.

One possible way to solve the problem is to take the consensus tree as if
it were a user-defined tree, use your whole data set, and infer branch
lengths on that tree.  Daniel has already expressed his legitimate concerns
in such a case as to whether it takes (for example) trifurcations as if
they were real rather than an expression of our uncertainty.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans
Hall, Berkeley, CA  94710






[[alternative HTML version deleted]]

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo



--
Emmanuel Paradis
IRD, Jakarta, Indonesia
http://ape.mpl.ird.fr/

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo


Re: [R-sig-phylo] Why no branch lengths on consensus trees?

2012-11-22 Thread Joe Felsenstein

Emmanuel Paradis wrote:

 If you are Bayesian, the trees sampled from an MCMC are here for estimation 
 including of the branch lengths, so you use them to compute some sort of 
 consensus topology as well as its branch lengths. So it makes sense that 
 MrBayes can do a consensus tree with branch lengths.


I endorse the rest of Emmanuel's advice but let me quibble with this one.  The 
posterior on trees may not consist mostly of trees varying around a single 
consensus.  If the posterior had, for example, two modes, each centered around 
a different tree, a single consensus tree might not be appropriate, and branch 
lengths computed by averaging lengths over the two modes might not be a good 
guide to what the trees in the posterior looked like.  I don't know enough 
about MrBayes features to know whether they have some way around this.

There is a similar issue with parsimony methods -- the set of most parsimonious 
trees may have a consensus, which may well not be a most parsimonious tree. 
People who see the consensus of most parsimonious trees may not realize that 
the particular tree they are looking at is not most parsimonious.

J.F.

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA

(from 1 October 2012 to 10 December 2012 on sabbatical  leave at)
Department of Statistics, University of California, Berkeley, 367 Evans Hall, 
Berkeley, CA  94710

___
R-sig-phylo mailing list
R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo