Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)
Hello all, Unable to get phyloch's read.mrbayes() to read my .con.tre files (for reasons unknown), I've gone with Graham's suggestion to use the simple output format from sumt. However, as I have a large number of analyses which took considerable computational time (and I am stubborn to learn how to run these on a cluster) I decided the more economical thing to do was to write an R script that took the existing nexus files and wrote a nexus file with MrBayes blocks that would output 'simple' format consensus trees. I realize such string manipulation might be easier in some other language, but I know R best. Here it is, in case anyone else ever needs such a thing: _ #change to dir for MB files setwd(C:/dave/workspace/mrbayes) #get files files-list.files() #get tree files treeFiles-files[grep(x=files,pattern=run..t)] #get the nexus files inputs-sort(unique(sub(.run..t,,treeFiles))) inputs-sort(unique(sub(.tree.,,inputs))) #now start constructing sumt lines bayesBlock-#NEXUS for(i in 1:length(inputs)){ origNex-scan(inputs[i],what=character,sep=\n,blank.lines.skip=FALSE) cropNex-origNex[2:grep(origNex,pattern=mcmcp)] #remove log line cropNex-cropNex[-grep(cropNex,pattern=log start)] sumtLine-paste0(sumt filename=,inputs[i], outputname=, paste0(inputs[i],.simple), conformat=simple;) bayesBlock-c(bayesBlock,cropNex,sumtLine,,end;,) } write(bayesBlock,file=simple_sumt.nex) ___ Some caveats: this script assumes you already have mrbayes blocks in your nexus files (all of which are stored with all output in the same directory), including a line where you set MCMC parameters (mcmcp) just prior to calling the MCMC itself and after any other modifications to the dataset, model or prior settings. If you execute the nexus file created, it should run sumt and output new output files with 'simple' in the file name. Analyses where there are multiple trees (such as partitioned analyses with unlinked branch lengths) will have multiple consensus trees estimated for them, as expected if we had originally done this immediately after running mcmc. Also note I crop out any lines that create new log files, because I didn't want to write over my old log files. The resulting simple-format consensus tree files can be read into ape with read.nexus just fine. For whatever reason, these nexus files contain two trees, otherwise identical other than that one contains the posterior probabilities as its node-labels. However! There is a big however here. If we re-root the tree in ape (and there are many reasons why we might want to do so), the posterior probabilities, which are stored as node labels, no longer make sense as those node labels are not properties technically of the nodes but of the edges preceding those nodes. Change the directionality of the graph by rerooting and those posterior probabilities end up in the wrong place. (TreeFig apparently figures this out on its own, automatically, if your reroot in TreeFig.) I haven't found any option in an R package that slides node-labels around as if they were support values when the tree is rerooted (and if there is one, please let me know!) so I had to write my own thing for doing that to the newly created 'simple' consensus tree files... So here's another script. _ #change dir setwd(C:/dave/workspace/mrbayes) library(ape) files-list.files() #get consensus nexus files (ugh regexp) conFiles-files[grep(x=files,pattern=.con.tre)] #get only 'simple' files conFiles-conFiles[grep(x=conFiles,pattern=simple)] conName-sub(.nex.simple,,conFiles) conName-sub(.con.tre,,conName) trees-list() for(i in 1:length(conFiles)){ treesRead-read.nexus(conFiles[i]) treeA-treesRead[[2]] treeB-treesRead[[1]] #now we're going to reroot and let's use fake species names hahah treeA-root(treeA,resolve.root=TRUE,outgroup=c(OutOfTheGroup,OutThere, outgroupy,groupout)) #okay basically need to figure out #which taxa are included by a node #and match between treeB and tree A splitsB-lapply(prop.part(treeB),function(x) if(any(x==1)){treeB$tip.label[x] }else{treeB$tip.label[-x]}) splitsA-lapply(prop.part(treeA),function(x) if(any(x==1)){treeA$tip.label[x] }else{treeA$tip.label[-x]}) #plot(treeB,show.node.label=TRUE) treeA$node.label-treeB$node.label[match(splitsA,splitsB)] trees[[i]]-treeA } class(trees)-multiPhylo #plot(trees) pdf(consensus_trees_10-07-14.pdf,height=8,width=8) for(i in 1:length(trees)){ plot(trees[[i]],main=conName[i],show.node.label=TRUE) add.scale.bar() } dev.off() _ The last bit just plots it in a pdf, which isn't important at all but I included it for the heck of it. The important stuff is the innards of the first for loop, which reads in
Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)
Hi all, here is a small function which it is very similar to David's 2nd script, but it is a bit more compact: library(phangorn) # requires the newest pangorn version = 1.99-8 addConfidences.phylo - function(to, from){ conf = attr(addConfidences(as.splits(to), from), confidences) nTips = length(to$tip.label) to$node.label = conf[-c(1:nTips)] to } treeA - addConfidences.phylo(treeA, treeB) I will fix also my midpoint rooting function. And some comments on possible problems / features of the 2nd script: The function assumes that the ordering of the tip labels is identical with the effect t3 t4 is different to t4 t3 A lapply(splitsA, sort) and lapply(splitsB, sort) should do the trick. Now the feature: let assume there a 5 taxa t1 t2 t3 t4 t5 A clade containing t1 t2 will be different from t3 t4 t5. If you want just to reroot a tree, you probably do not want to distinguish clades but only splits. Regards, Klaus On Wed, Oct 8, 2014 at 10:51 AM, David Bapst dwba...@gmail.com wrote: Hello all, Unable to get phyloch's read.mrbayes() to read my .con.tre files (for reasons unknown), I've gone with Graham's suggestion to use the simple output format from sumt. However, as I have a large number of analyses which took considerable computational time (and I am stubborn to learn how to run these on a cluster) I decided the more economical thing to do was to write an R script that took the existing nexus files and wrote a nexus file with MrBayes blocks that would output 'simple' format consensus trees. I realize such string manipulation might be easier in some other language, but I know R best. Here it is, in case anyone else ever needs such a thing: _ #change to dir for MB files setwd(C:/dave/workspace/mrbayes) #get files files-list.files() #get tree files treeFiles-files[grep(x=files,pattern=run..t)] #get the nexus files inputs-sort(unique(sub(.run..t,,treeFiles))) inputs-sort(unique(sub(.tree.,,inputs))) #now start constructing sumt lines bayesBlock-#NEXUS for(i in 1:length(inputs)){ origNex-scan(inputs[i],what=character,sep=\n,blank.lines.skip=FALSE) cropNex-origNex[2:grep(origNex,pattern=mcmcp)] #remove log line cropNex-cropNex[-grep(cropNex,pattern=log start)] sumtLine-paste0(sumt filename=,inputs[i], outputname=, paste0(inputs[i],.simple), conformat=simple;) bayesBlock-c(bayesBlock,cropNex,sumtLine,,end;,) } write(bayesBlock,file=simple_sumt.nex) ___ Some caveats: this script assumes you already have mrbayes blocks in your nexus files (all of which are stored with all output in the same directory), including a line where you set MCMC parameters (mcmcp) just prior to calling the MCMC itself and after any other modifications to the dataset, model or prior settings. If you execute the nexus file created, it should run sumt and output new output files with 'simple' in the file name. Analyses where there are multiple trees (such as partitioned analyses with unlinked branch lengths) will have multiple consensus trees estimated for them, as expected if we had originally done this immediately after running mcmc. Also note I crop out any lines that create new log files, because I didn't want to write over my old log files. The resulting simple-format consensus tree files can be read into ape with read.nexus just fine. For whatever reason, these nexus files contain two trees, otherwise identical other than that one contains the posterior probabilities as its node-labels. However! There is a big however here. If we re-root the tree in ape (and there are many reasons why we might want to do so), the posterior probabilities, which are stored as node labels, no longer make sense as those node labels are not properties technically of the nodes but of the edges preceding those nodes. Change the directionality of the graph by rerooting and those posterior probabilities end up in the wrong place. (TreeFig apparently figures this out on its own, automatically, if your reroot in TreeFig.) I haven't found any option in an R package that slides node-labels around as if they were support values when the tree is rerooted (and if there is one, please let me know!) so I had to write my own thing for doing that to the newly created 'simple' consensus tree files... So here's another script. _ #change dir setwd(C:/dave/workspace/mrbayes) library(ape) files-list.files() #get consensus nexus files (ugh regexp) conFiles-files[grep(x=files,pattern=.con.tre)] #get only 'simple' files conFiles-conFiles[grep(x=conFiles,pattern=simple)] conName-sub(.nex.simple,,conFiles) conName-sub(.con.tre,,conName) trees-list() for(i in 1:length(conFiles)){ treesRead-read.nexus(conFiles[i]) treeA-treesRead[[2]] treeB-treesRead[[1]] #now we're going to