Dear all,
I've stumbled upon a phylogeny problem I can't seem to solve with R, and I'm
hoping that someone could help me with that.
I have two datasets of the same samples: one with a few specific gene sequences
and one with expression data for a huge set of genes (almost 3000). I want to
create phylogenetic trees for both these datasets and then combine them in a
heat map.
The expression dataset poses no problem; I read in the table, remove rows with
NAs and then do a pairwise correlation and calculate the distances for the tree:
expression_NA - read.table(expression_data, header=T, row.names=1)
sum(apply(!is.na(expression_NA),1,sum)19)
[1] 1144
expression - expression_NA[!apply(is.na(expression_NA),1,sum),]
training.genes.cor - cor((expression), use=pairwise.complete.obs,
method=pearson)
training.genes.cor.dist - as.dist(1-training.genes.cor)
training.genes.tree - hclust(training.genes.cor.dist, method=average,
members=NULL)
The other dataset proved to be very difficult, though.
I aligned the sequences using ClustalW2 and also created the phylogenetic tree
with that tool. I know the ape package can read ClustalW2 tree files (I tried
both .dnd and .ph), but the tree is neither rooted nor ultrametric, only
binary. This apparently is a problem when I need to use hclust to prepare the
data for the heat map.
align.tree -
read.tree(file=phylo_tree.dnd,text=NULL,tree.names=NULL,skip=0,comment.char=#,keep.multi=FALSE)
str(align.tree)
List of 4
$ edge : int [1:47, 1:2] 26 27 28 29 30 31 31 30 29 32 ...
$ Nnode : int 23
$ tip.label : chr [1:25] UTI_3_KO11_00460_merged UTI_3a_KO11_00460_merged
pool_20_bc24_KO11_00460_merged UTI_5_KO11_00460_merged ...
$ edge.length: num [1:47] 0.00014 0.00027 0.00087 0.00234 0.00045 0.0007
0.00091 0.00104 0.00134 0.00031 ...
- attr(*, class)= chr phylo
- attr(*, order)= chr cladewise
Now, I don't know exactly what ultrametric actually is, but from what I found
online, it seems to be connected with molecular dating, yes? I'm not sure if I
want that, but I used chronopl anyway, just to make my tree ultrametric.
align.tree.ultra - chronopl(align.tree, 0, age.min = 1, age.max = NULL, node
= root, S = 1, tol = 1e-8,CV = FALSE, eval.max = 500, iter.max = 500)
fix(align.tree.ultra)
str(align.tree.ultra)
List of 4
$ edge : int [1:47, 1:2] 26 27 28 29 30 31 31 30 29 32 ...
$ Nnode : int 23
$ tip.label : chr [1:25] UTI_3_KO11_00460_merged UTI_3a_KO11_00460_merged
pool_20_bc24_KO11_00460_merged UTI_5_KO11_00460_merged ...
$ edge.length: num [1:47] 0.166 0.171 0.169 0.166 0.166 ...
- attr(*, class)= chr phylo
- attr(*, order)= chr cladewise
- attr(*, ploglik)= num -1.05
- attr(*, rates)= num [1:47] 0.000845 0.001583 0.005156 0.014081 0.002711 ...
- attr(*, message)= chr relative convergence (4)
My tree is binary and ultrametric now, but it needs to be rooted as well. I
don't know what my out-group should be, so I just tried one
(root(align.tree.ultra,1)), after which my tree was neither rooted not
ultrametric anymore. I found a trick in the help files:
align.tree.ultra$root.edge - 0
which managed to root my tree, but then I got this:
is.ultrametric(align.tree.ultra)
[1] TRUE
is.binary.tree(align.tree.ultra)
[1] FALSE
is.rooted(align.tree.ultra)
[1] TRUE
I tried as.hclust.phylo() anyway, just for fun, but all I got was this error
message:
test - as.hclust.phylo(align.tree.ultra)
Error in as.hclust.phylo(align.tree.ultra) : the tree is not binary
Does anyone know a workaround for this problem? Why is it so difficult to use
hclust on an unrooted and/or not ultrametric tree? Is there probably another
package which can do just that?
Help and/or explanations would be greatly appreciated.
-
Sarah Pohl
PhD student
MOBA-Molecular Bacteriology
Helmholtz Centre for Infection Research
Inhoffenstrasse 7
D-38124 Braunschweig
Tel: +49 531 61 81 - 30 76
Fax: +49 531 61 81 - 30 99
eMail: sarah.p...@helmholtz-hzi.demailto:sarah.p...@helmholtz-hzi.de
Helmholtz-Zentrum f?r Infektionsforschung GmbH | Inhoffenstra?e 7 | 38124
Braunschweig | www.helmholtz-hzi.de
Vorsitzende des Aufsichtsrates: MinDir'in B?rbel Brumme-Bothe,
Bundesministerium f?r Bildung und Forschung
Stellvertreter: R?diger Eichel, Abteilungsleiter Nieders?chsisches Ministerium
f?r Wissenschaft und Kultur
Gesch?ftsf?hrung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA
Gesellschaft mit beschr?nkter Haftung (GmbH)
Sitz der Gesellschaft: Braunschweig
Handelsregister: Amtsgericht Braunschweig, HRB 477
[[alternative HTML version deleted]]
___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/