[R-sig-phylo] From ClustalW2 Tree to Heat Map in R

2013-01-10 Thread Sarah Pohl
Dear all,

I've stumbled upon a phylogeny problem I can't seem to solve with R, and I'm 
hoping that someone could help me with that.
I have two datasets of the same samples: one with a few specific gene sequences 
and one with expression data for a huge set of genes (almost 3000). I want to 
create phylogenetic trees for both these datasets and then combine them in a 
heat map.

The expression dataset poses no problem; I read in the table, remove rows with 
NAs and then do a pairwise correlation and calculate the distances for the tree:

 expression_NA - read.table(expression_data, header=T, row.names=1)
 sum(apply(!is.na(expression_NA),1,sum)19)
[1] 1144
 expression - expression_NA[!apply(is.na(expression_NA),1,sum),]
 training.genes.cor - cor((expression), use=pairwise.complete.obs, 
 method=pearson)
 training.genes.cor.dist - as.dist(1-training.genes.cor)
 training.genes.tree - hclust(training.genes.cor.dist, method=average, 
 members=NULL)

The other dataset proved to be very difficult, though.
I aligned the sequences using ClustalW2 and also created the phylogenetic tree 
with that tool. I know the ape package can read ClustalW2 tree files (I tried 
both .dnd and .ph), but the tree is neither rooted nor ultrametric, only 
binary. This apparently is a problem when I need to use hclust to prepare the 
data for the heat map.

 align.tree - 
 read.tree(file=phylo_tree.dnd,text=NULL,tree.names=NULL,skip=0,comment.char=#,keep.multi=FALSE)
 str(align.tree)
List of 4
$ edge   : int [1:47, 1:2] 26 27 28 29 30 31 31 30 29 32 ...
$ Nnode  : int 23
$ tip.label  : chr [1:25] UTI_3_KO11_00460_merged UTI_3a_KO11_00460_merged 
pool_20_bc24_KO11_00460_merged UTI_5_KO11_00460_merged ...
$ edge.length: num [1:47] 0.00014 0.00027 0.00087 0.00234 0.00045 0.0007 
0.00091 0.00104 0.00134 0.00031 ...
- attr(*, class)= chr phylo
- attr(*, order)= chr cladewise

Now, I don't know exactly what ultrametric actually is, but from what I found 
online, it seems to be connected with molecular dating, yes? I'm not sure if I 
want that, but I used chronopl anyway, just to make my tree ultrametric.

 align.tree.ultra - chronopl(align.tree, 0, age.min = 1, age.max = NULL, node 
 = root, S = 1, tol = 1e-8,CV = FALSE, eval.max = 500, iter.max = 500)
 fix(align.tree.ultra)
 str(align.tree.ultra)
List of 4
$ edge   : int [1:47, 1:2] 26 27 28 29 30 31 31 30 29 32 ...
$ Nnode  : int 23
$ tip.label  : chr [1:25] UTI_3_KO11_00460_merged UTI_3a_KO11_00460_merged 
pool_20_bc24_KO11_00460_merged UTI_5_KO11_00460_merged ...
$ edge.length: num [1:47] 0.166 0.171 0.169 0.166 0.166 ...
- attr(*, class)= chr phylo
- attr(*, order)= chr cladewise
- attr(*, ploglik)= num -1.05
- attr(*, rates)= num [1:47] 0.000845 0.001583 0.005156 0.014081 0.002711 ...
- attr(*, message)= chr relative convergence (4)

My tree is binary and ultrametric now, but it needs to be rooted as well. I 
don't know what my out-group should be, so I just tried one 
(root(align.tree.ultra,1)), after which my tree was neither rooted not 
ultrametric anymore. I found a trick in the help files:

 align.tree.ultra$root.edge - 0

which managed to root my tree, but then I got this:

 is.ultrametric(align.tree.ultra)
[1] TRUE
 is.binary.tree(align.tree.ultra)
[1] FALSE
 is.rooted(align.tree.ultra)
[1] TRUE

I tried as.hclust.phylo() anyway, just for fun, but all I got was this error 
message:

 test - as.hclust.phylo(align.tree.ultra)
Error in as.hclust.phylo(align.tree.ultra) : the tree is not binary


Does anyone know a workaround for this problem? Why is it so difficult to use 
hclust on an unrooted and/or not ultrametric tree? Is there probably another 
package which can do just that?

Help and/or explanations would be greatly appreciated.

-

Sarah Pohl
PhD student

MOBA-Molecular Bacteriology
Helmholtz Centre for Infection Research
Inhoffenstrasse 7
D-38124 Braunschweig

Tel: +49 531 61 81 - 30 76
Fax: +49 531 61 81 - 30 99
eMail: sarah.p...@helmholtz-hzi.demailto:sarah.p...@helmholtz-hzi.de




Helmholtz-Zentrum f?r Infektionsforschung GmbH | Inhoffenstra?e 7 | 38124 
Braunschweig | www.helmholtz-hzi.de

Vorsitzende des Aufsichtsrates: MinDir'in B?rbel Brumme-Bothe, 
Bundesministerium f?r Bildung und Forschung
Stellvertreter: R?diger Eichel, Abteilungsleiter Nieders?chsisches Ministerium 
f?r Wissenschaft und Kultur
Gesch?ftsf?hrung: Prof. Dr. Dirk Heinz; Ulf Richter, MBA
Gesellschaft mit beschr?nkter Haftung (GmbH)
Sitz der Gesellschaft: Braunschweig
Handelsregister: Amtsgericht Braunschweig, HRB 477

[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/


Re: [R-sig-phylo] From ClustalW2 Tree to Heat Map in R

2013-01-10 Thread Joe Felsenstein

Klaus Schliep wrote:

 There is quite some irony that in phylogenetic reconstruction often
 non-ultrametric methods are preferred, even though the time to the
 last common ancestor (LCA) should be for each extend species the same.
 However other fields use heavily ultrametric methods (hclust) even
 there exist no evident equivalent property like the LCA.

The irony, such as it is, is due to the fact that we can only see amounts of 
difference, not amounts of time, and, darn it, different organisms have 
different biologies so they change at different rates.  Which makes biology 
more interesting but makes molecular change less clocklike.

Joe

Joe Felsenstein j...@gs.washington.edu
Department of Genome Sciences and Department of Biology,
University of Washington, Box 355065, Seattle, WA 98195-5065 USA




[[alternative HTML version deleted]]

___
R-sig-phylo mailing list - R-sig-phylo@r-project.org
https://stat.ethz.ch/mailman/listinfo/r-sig-phylo
Searchable archive at http://www.mail-archive.com/r-sig-phylo@r-project.org/