Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

2014-10-08 Thread David Bapst
Hello all,

Unable to get phyloch's read.mrbayes() to read my .con.tre files (for
reasons unknown), I've gone with Graham's suggestion to use the simple
output format from sumt. However, as I have a large number of analyses
which took considerable computational time (and I am stubborn to learn
how to run these on a cluster) I decided the more economical thing to
do was to write an R script that took the existing nexus files and
wrote a nexus file with MrBayes blocks that would output 'simple'
format consensus trees. I realize such string manipulation might be
easier in some other language, but I know R best.

Here it is, in case anyone else ever needs such a thing:

_

#change to dir for MB files
setwd(C:/dave/workspace/mrbayes)

#get files
files-list.files()
#get tree files
treeFiles-files[grep(x=files,pattern=run..t)]
#get the nexus files
inputs-sort(unique(sub(.run..t,,treeFiles)))
inputs-sort(unique(sub(.tree.,,inputs)))

#now start constructing sumt lines
bayesBlock-#NEXUS
for(i in 1:length(inputs)){
origNex-scan(inputs[i],what=character,sep=\n,blank.lines.skip=FALSE)
cropNex-origNex[2:grep(origNex,pattern=mcmcp)]
#remove log line
cropNex-cropNex[-grep(cropNex,pattern=log start)]
sumtLine-paste0(sumt filename=,inputs[i], outputname=,
paste0(inputs[i],.simple), conformat=simple;)
bayesBlock-c(bayesBlock,cropNex,sumtLine,,end;,)
}

write(bayesBlock,file=simple_sumt.nex)

___

Some caveats: this script assumes you already have mrbayes blocks in
your nexus files (all of which are stored with all output in the same
directory), including a line where you set MCMC parameters (mcmcp)
just prior to calling the MCMC itself and after any other
modifications to the dataset, model or prior settings.

If you execute the nexus file created, it should run sumt and output
new output files with 'simple' in the file name. Analyses where there
are multiple trees (such as partitioned analyses with unlinked branch
lengths) will have multiple consensus trees estimated for them, as
expected if we had originally done this immediately after running
mcmc. Also note I crop out any lines that create new log files,
because I didn't want to write over my old log files.

The resulting simple-format consensus tree files can be read into ape
with read.nexus just fine. For whatever reason, these nexus files
contain two trees, otherwise identical other than that one contains
the posterior probabilities as its node-labels.

However! There is a big however here.

If we re-root the tree in ape (and there are many reasons why we might
want to do so), the posterior probabilities, which are stored as node
labels, no longer make sense as those node labels are not properties
technically of the nodes but of the edges preceding those nodes.
Change the directionality of the graph by rerooting and those
posterior probabilities end up in the wrong place. (TreeFig apparently
figures this out on its own, automatically, if your reroot in
TreeFig.)

I haven't found any option in an R package that slides node-labels
around as if they were support values when the tree is rerooted (and
if there is one, please let me know!) so I had to write my own thing
for doing that to the newly created 'simple' consensus tree files...

So here's another script.

_

#change dir
setwd(C:/dave/workspace/mrbayes)

library(ape)

files-list.files()
#get consensus nexus files (ugh regexp)
conFiles-files[grep(x=files,pattern=.con.tre)]
#get only 'simple' files
conFiles-conFiles[grep(x=conFiles,pattern=simple)]
conName-sub(.nex.simple,,conFiles)
conName-sub(.con.tre,,conName)

trees-list()
for(i in 1:length(conFiles)){
treesRead-read.nexus(conFiles[i])
treeA-treesRead[[2]]
treeB-treesRead[[1]]
#now we're going to reroot and let's use fake species names hahah
treeA-root(treeA,resolve.root=TRUE,outgroup=c(OutOfTheGroup,OutThere,
outgroupy,groupout))
#okay basically need to figure out
#which taxa are included by a node
#and match between treeB and tree A
splitsB-lapply(prop.part(treeB),function(x)
if(any(x==1)){treeB$tip.label[x]
}else{treeB$tip.label[-x]})
splitsA-lapply(prop.part(treeA),function(x)
if(any(x==1)){treeA$tip.label[x]
}else{treeA$tip.label[-x]})
#plot(treeB,show.node.label=TRUE)
treeA$node.label-treeB$node.label[match(splitsA,splitsB)]
trees[[i]]-treeA
}
class(trees)-multiPhylo

#plot(trees)

pdf(consensus_trees_10-07-14.pdf,height=8,width=8)
for(i in 1:length(trees)){
plot(trees[[i]],main=conName[i],show.node.label=TRUE)
add.scale.bar()
}
dev.off()

_

The last bit just plots it in a pdf, which isn't important at all but
I included it for the heck of it. The important stuff is the innards
of the first for loop, which reads in 

Re: [R-sig-phylo] Plotting Posterior Probabilities from MrBayes Trees (An Update)

2014-10-08 Thread Klaus Schliep
Hi all,

here is a small function which it is very similar to David's 2nd script,
but it is a bit more compact:
library(phangorn) # requires the newest pangorn version = 1.99-8
addConfidences.phylo - function(to, from){
conf = attr(addConfidences(as.splits(to), from), confidences)
nTips = length(to$tip.label)
to$node.label = conf[-c(1:nTips)]
to
}
treeA - addConfidences.phylo(treeA, treeB)

I will fix also my midpoint rooting function.

And some comments on possible problems / features of the 2nd script:
The function assumes that the ordering of the tip labels is identical with
the effect

t3 t4

is different to

t4 t3

A lapply(splitsA, sort) and lapply(splitsB, sort) should do the trick.
Now the feature:
let assume there a 5 taxa  t1 t2 t3 t4 t5
A clade containing t1 t2
will be different from t3 t4 t5.
If you want just to reroot a tree, you probably do not want to distinguish
clades but only splits.

Regards,
Klaus


On Wed, Oct 8, 2014 at 10:51 AM, David Bapst dwba...@gmail.com wrote:

 Hello all,

 Unable to get phyloch's read.mrbayes() to read my .con.tre files (for
 reasons unknown), I've gone with Graham's suggestion to use the simple
 output format from sumt. However, as I have a large number of analyses
 which took considerable computational time (and I am stubborn to learn
 how to run these on a cluster) I decided the more economical thing to
 do was to write an R script that took the existing nexus files and
 wrote a nexus file with MrBayes blocks that would output 'simple'
 format consensus trees. I realize such string manipulation might be
 easier in some other language, but I know R best.

 Here it is, in case anyone else ever needs such a thing:

 _

 #change to dir for MB files
 setwd(C:/dave/workspace/mrbayes)

 #get files
 files-list.files()
 #get tree files
 treeFiles-files[grep(x=files,pattern=run..t)]
 #get the nexus files
 inputs-sort(unique(sub(.run..t,,treeFiles)))
 inputs-sort(unique(sub(.tree.,,inputs)))

 #now start constructing sumt lines
 bayesBlock-#NEXUS
 for(i in 1:length(inputs)){

 origNex-scan(inputs[i],what=character,sep=\n,blank.lines.skip=FALSE)
 cropNex-origNex[2:grep(origNex,pattern=mcmcp)]
 #remove log line
 cropNex-cropNex[-grep(cropNex,pattern=log start)]
 sumtLine-paste0(sumt filename=,inputs[i], outputname=,
 paste0(inputs[i],.simple), conformat=simple;)
 bayesBlock-c(bayesBlock,cropNex,sumtLine,,end;,)
 }

 write(bayesBlock,file=simple_sumt.nex)

 ___

 Some caveats: this script assumes you already have mrbayes blocks in
 your nexus files (all of which are stored with all output in the same
 directory), including a line where you set MCMC parameters (mcmcp)
 just prior to calling the MCMC itself and after any other
 modifications to the dataset, model or prior settings.

 If you execute the nexus file created, it should run sumt and output
 new output files with 'simple' in the file name. Analyses where there
 are multiple trees (such as partitioned analyses with unlinked branch
 lengths) will have multiple consensus trees estimated for them, as
 expected if we had originally done this immediately after running
 mcmc. Also note I crop out any lines that create new log files,
 because I didn't want to write over my old log files.

 The resulting simple-format consensus tree files can be read into ape
 with read.nexus just fine. For whatever reason, these nexus files
 contain two trees, otherwise identical other than that one contains
 the posterior probabilities as its node-labels.

 However! There is a big however here.

 If we re-root the tree in ape (and there are many reasons why we might
 want to do so), the posterior probabilities, which are stored as node
 labels, no longer make sense as those node labels are not properties
 technically of the nodes but of the edges preceding those nodes.
 Change the directionality of the graph by rerooting and those
 posterior probabilities end up in the wrong place. (TreeFig apparently
 figures this out on its own, automatically, if your reroot in
 TreeFig.)

 I haven't found any option in an R package that slides node-labels
 around as if they were support values when the tree is rerooted (and
 if there is one, please let me know!) so I had to write my own thing
 for doing that to the newly created 'simple' consensus tree files...

 So here's another script.

 _

 #change dir
 setwd(C:/dave/workspace/mrbayes)

 library(ape)

 files-list.files()
 #get consensus nexus files (ugh regexp)
 conFiles-files[grep(x=files,pattern=.con.tre)]
 #get only 'simple' files
 conFiles-conFiles[grep(x=conFiles,pattern=simple)]
 conName-sub(.nex.simple,,conFiles)
 conName-sub(.con.tre,,conName)

 trees-list()
 for(i in 1:length(conFiles)){
 treesRead-read.nexus(conFiles[i])
 treeA-treesRead[[2]]
 treeB-treesRead[[1]]
 #now we're going to