Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

Klaus Schliep Wed, 08 Oct 2014 15:22:48 -0700

Hi all,

here is a small function which it is very similar to David's 2nd script,
but it is a bit more compact:
library(phangorn) # requires the newest pangorn version >= 1.99-8
addConfidences.phylo <- function(to, from){
    conf = attr(addConfidences(as.splits(to), from), "confidences")
    nTips = length(to$tip.label)
    to$node.label = conf[-c(1:nTips)]
    to
}
treeA <- addConfidences.phylo(treeA, treeB)


I will fix also my midpoint rooting function.

And some comments on possible problems / features of the 2nd script:
The function assumes that the ordering of the tip labels is identical with
the effect

"t3" "t4"

is different to

"t4" "t3"

A lapply(splitsA, sort) and lapply(splitsB, sort) should do the trick.
Now the feature:
let assume there a 5 taxa  "t1" "t2" "t3" "t4" "t5"
A clade containing "t1" "t2"
will be different from "t3" "t4" "t5".
If you want just to reroot a tree, you probably do not want to distinguish
clades but only splits.

Regards,
Klaus


On Wed, Oct 8, 2014 at 10:51 AM, David Bapst <dwba...@gmail.com> wrote:

> Hello all,
>
> Unable to get phyloch's read.mrbayes() to read my .con.tre files (for
> reasons unknown), I've gone with Graham's suggestion to use the simple
> output format from sumt. However, as I have a large number of analyses
> which took considerable computational time (and I am stubborn to learn
> how to run these on a cluster) I decided the more economical thing to
> do was to write an R script that took the existing nexus files and
> wrote a nexus file with MrBayes blocks that would output 'simple'
> format consensus trees. I realize such string manipulation might be
> easier in some other language, but I know R best.
>
> Here it is, in case anyone else ever needs such a thing:
>
> _________________________________________________
>
> #change to dir for MB files
> setwd("C:/dave/workspace/mrbayes")
>
> #get files
> files<-list.files()
> #get tree files
> treeFiles<-files[grep(x=files,pattern="run..t")]
> #get the nexus files
> inputs<-sort(unique(sub(".run..t","",treeFiles)))
> inputs<-sort(unique(sub(".tree.","",inputs)))
>
> #now start constructing sumt lines
> bayesBlock<-"#NEXUS"
> for(i in 1:length(inputs)){
>
> origNex<-scan(inputs[i],what="character",sep="\n",blank.lines.skip=FALSE)
>     cropNex<-origNex[2:grep(origNex,pattern="mcmcp")]
>     #remove log line
>     cropNex<-cropNex[-grep(cropNex,pattern="log start")]
>     sumtLine<-paste0("sumt filename=",inputs[i]," outputname=",
>         paste0(inputs[i],".simple")," conformat=simple;")
>     bayesBlock<-c(bayesBlock,cropNex,sumtLine,"","end;","")
>     }
>
> write(bayesBlock,file="simple_sumt.nex")
>
> _______________________________________________
>
> Some caveats: this script assumes you already have mrbayes blocks in
> your nexus files (all of which are stored with all output in the same
> directory), including a line where you set MCMC parameters (mcmcp)
> just prior to calling the MCMC itself and after any other
> modifications to the dataset, model or prior settings.
>
> If you execute the nexus file created, it should run sumt and output
> new output files with 'simple' in the file name. Analyses where there
> are multiple trees (such as partitioned analyses with unlinked branch
> lengths) will have multiple consensus trees estimated for them, as
> expected if we had originally done this immediately after running
> mcmc. Also note I crop out any lines that create new log files,
> because I didn't want to write over my old log files.
>
> The resulting simple-format consensus tree files can be read into ape
> with read.nexus just fine. For whatever reason, these nexus files
> contain two trees, otherwise identical other than that one contains
> the posterior probabilities as its node-labels.
>
> ....However! There is a big however here.
>
> If we re-root the tree in ape (and there are many reasons why we might
> want to do so), the posterior probabilities, which are stored as node
> labels, no longer make sense as those node labels are not properties
> technically of the nodes but of the edges preceding those nodes.
> Change the directionality of the graph by rerooting and those
> posterior probabilities end up in the wrong place. (TreeFig apparently
> figures this out on its own, automatically, if your reroot in
> TreeFig.)
>
> I haven't found any option in an R package that slides node-labels
> around as if they were support values when the tree is rerooted (and
> if there is one, please let me know!) so I had to write my own thing
> for doing that to the newly created 'simple' consensus tree files...
>
> So here's another script.
>
> _____________________________________________
>
> #change dir
> setwd("C:/dave/workspace/mrbayes")
>
> library(ape)
>
> files<-list.files()
> #get consensus nexus files (ugh regexp)
> conFiles<-files[grep(x=files,pattern=".con.tre")]
> #get only 'simple' files
> conFiles<-conFiles[grep(x=conFiles,pattern="simple")]
> conName<-sub(".nex.simple","",conFiles)
> conName<-sub(".con.tre","",conName)
>
> trees<-list()
> for(i in 1:length(conFiles)){
>     treesRead<-read.nexus(conFiles[i])
>     treeA<-treesRead[[2]]
>     treeB<-treesRead[[1]]
>     #now we're going to reroot and let's use fake species names hahah
>
> treeA<-root(treeA,resolve.root=TRUE,outgroup=c("OutOfTheGroup","OutThere",
>         "outgroupy","groupout"))
>     #okay basically need to figure out
>         #which taxa are included by a node
>         #and match between treeB and tree A
>     splitsB<-lapply(prop.part(treeB),function(x)
>         if(any(x==1)){treeB$tip.label[x]
>         }else{treeB$tip.label[-x]})
>     splitsA<-lapply(prop.part(treeA),function(x)
>         if(any(x==1)){treeA$tip.label[x]
>         }else{treeA$tip.label[-x]})
>     #plot(treeB,show.node.label=TRUE)
>     treeA$node.label<-treeB$node.label[match(splitsA,splitsB)]
>     trees[[i]]<-treeA
>     }
> class(trees)<-"multiPhylo"
>
> #plot(trees)
>
> pdf("consensus_trees_10-07-14.pdf",height=8,width=8)
> for(i in 1:length(trees)){
>     plot(trees[[i]],main=conName[i],show.node.label=TRUE)
>     add.scale.bar()
>     }
> dev.off()
>
> _____________________________________________________
>
> The last bit just plots it in a pdf, which isn't important at all but
> I included it for the heck of it. The important stuff is the innards
> of the first for loop, which reads in the 'simple' consensus tree
> nexus file, reroots it and then rearranges the posterior probabilities
> to match the same splits across both trees. I've checked this against
> the read out logged by MrBayes and the result of rerooted trees in
> FigTree, and the resulting posterior probabilities match those shown
> across all three programs.
>
> Anyway, I hope this helps the next person trying to figure out how to
> plot posterior probabilites from MrBayes in R!
>
> Cheers,
> -Dave
>
>
>
> On Fri, Oct 3, 2014 at 10:29 AM, David Bapst <dwba...@gmail.com> wrote:
> > Just to clarify, off-list discussion with Graham reveals (after some
> > confusion on my part) that if the simple format option is used in sumt
> > in MrBayes, then ape's read.nexus will natively read the posterior
> > probability values as node-labels. Which is interesting and I had not
> > come across that information previously.
> >
> > Graham's comment is not in reference to the functioning of
> read.annotated.nexus.
> >
> > Cheers,
> > -Dave
> >
> >
> > On Fri, Oct 3, 2014 at 8:49 AM, Slater, Graham <slat...@si.edu> wrote:
> >> Dave,
> >>
> >> I've had the same trouble with shuffling. However, all of this can be
> >> avoided if you specify the simple format for your .con file in the
> mrBayes
> >> block.
> >>
> >> sumt conformat = simple;
> >>
> >> The resulting tree will correctly display posterior probabilities in a
> >> phyloformat tree.
> >>
> >> Graham
> >>
> >>
> >>
> >> ------------------------------------------------------------
> >> Graham Slater
> >> Peter Buck Post-Doctoral Fellow
> >> Department of Paleobiology
> >> National Museum of Natural History
> >> The Smithsonian Institution [NHB, MRC 121]
> >> P.O. Box 37012
> >>
> >>
> >> (202) 633-1316
> >> slat...@si.edu
> >> www.fourdimensionalbiology.com
> >>
> >>
> >>
> >>
> >>
> >> On Oct 3, 2014, at 10:42 AM, David Bapst <dwba...@gmail.com> wrote:
> >>
> >> Hello all,
> >>
> >> Recently, I wanted to display posterior probabilities on a 50%
> >> compatibility tree from a MrBayes run, created with the 'sumt'
> >> command. I looked around for ways to do this and found this email
> >> thread from last year:
> >>
> >> https://stat.ethz.ch/pipermail/r-sig-phylo/2013-June/002825.html
> >>
> >> ...which suggests using read.annotated.nexus() in package epibase,
> >> which has been renamed OutbreakTools more recently.
> >>
> >> Unfortunately, it appears read.annotated.nexus in OutbreakTools
> >> (v0.1-11, in R 3.1.1) improperly creates the new edge matrix,
> >> resulting in an apparent shuffling of tip labels. As this function was
> >> discussed on R-Sig-Phylo, I am sending this bug report to both the
> >> list and to the package maintainer (Thibaut J.).
> >>
> >> I don't have any BEAST output files, so I cannot test if this occurs
> >> with files that are not created by MrBayes.
> >>
> >> To show this phenomen, I have created an example .con.tre file using
> >> an example matrix of standard characters for several elephants that I
> >> stole off of Joe F.'s website. You can find the .con.tre file here on
> >> gdrive:
> >>
> >>
> https://drive.google.com/file/d/0B_xvEcEvKno_LTFhSVJrdHMtNFU/view?usp=sharing
> >>
> >> And now in R, a comparison with a tree:
> >>
> >> library(OutbreakTools)
> >> library(ape)
> >>
> >> tree1<-read.nexus("mat.nex.con.tre")
> >> tree2<-read.annotated.nexus("mat.nex.con.tre")
> >>
> >> layout(1:2)
> >> plot(tree1,no.margin=TRUE)
> >> plot(tree2,no.margin=TRUE)
> >>
> >> identical(tree1$tip.label,tree2$tip.label)
> >> identical(tree1$edge.length,tree2$edge.length)
> >> identical(tree1$edge,tree2$edge)
> >>
> >> Inspection of the tree file with FigTree suggests that the ape
> >> function is producing the correct tree and the OutbreakTools function
> >> is not. We can see the why with last three lines in the console:
> >>
> >> identical(tree1$tip.label,tree2$tip.label)
> >>
> >> [1] TRUE
> >>
> >> identical(tree1$edge.length,tree2$edge.length)
> >>
> >> [1] TRUE
> >>
> >> identical(tree1$edge,tree2$edge)
> >>
> >> [1] FALSE
> >>
> >> Closer inspection suggests that the issue appears to be that some
> >> terminal edges are inappropriately shuffled, thus shuffling the
> >> terminal tip labels on those edges, while leaving the tip.label order
> >> and the edge lengths the same. Also, larger trees seem to produce
> >> larger inconsistencies in the tree produced.
> >>
> >> So... now for the list, are there any other methods on CRAN for
> >> obtaining the posterior probabilities from a .con.tre file? Otherwise,
> >> next I guess I'll be trying phyloch.
> >>
> >> Cheers,
> >> -Dave
> >>
> >> --
> >> David W. Bapst, PhD
> >> Adjunct Asst. Professor, Geology and Geol. Eng.
> >> South Dakota School of Mines and Technology
> >> 501 E. St. Joseph
> >> Rapid City, SD 57701
> >>
> >> http://webpages.sdsmt.edu/~dbapst/
> >> http://cran.r-project.org/web/packages/paleotree/index.html
> >>
> >> _______________________________________________
> >> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> >> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> >> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/
> >>
> >>
> >
> >
> >
> > --
> > David W. Bapst, PhD
> > Adjunct Asst. Professor, Geology and Geol. Eng.
> > South Dakota School of Mines and Technology
> > 501 E. St. Joseph
> > Rapid City, SD 57701
> >
> > http://webpages.sdsmt.edu/~dbapst/
> > http://cran.r-project.org/web/packages/paleotree/index.html
>
>
>
> --
> David W. Bapst, PhD
> Adjunct Asst. Professor, Geology and Geol. Eng.
> South Dakota School of Mines and Technology
> 501 E. St. Joseph
> Rapid City, SD 57701
>
> http://webpages.sdsmt.edu/~dbapst/
> http://cran.r-project.org/web/packages/paleotree/index.html
>
> _______________________________________________
> R-sig-phylo mailing list - R-sig-phylo@r-project.org
> https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
> Searchable archive at
> http://www.mail-archive.com/r-sig-phylo@r-project.org/




-- 
Klaus Schliep
Postdoctoral Fellow
Revell Lab, University of Massachusetts Boston

        [[alternative HTML version deleted]]

_______________________________________________
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/

Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

Reply via email to