Dear List, Hazel Hartman Jenkins wrote:
[corrected] > If I run the following command; > fneighbor -datafile tinytest.dat -replicates y -outfile filefrom.fnb > then everything works. > > If, however, my tinytest.dat contains two similarity matricies (or, for > that matter, the one hundred bootstrap replicates written by fdnadist by > default), like this; > 3 > 1187Aquife 0.000000 0.368385 0.404489 > BB213b06 0.368385 0.000000 0.151182 > BB269b06 0.404489 0.151182 0.000000 > 3 > 1187Aquife 0.000000 0.368385 0.404489 > BB213b06 0.368385 0.000000 0.151182 > BB269b06 0.404489 0.151182 0.000000 > > then fneighbor returns; > <quote> > Phylogenies from distance matrix by N-J or UPGMA method > Segmentation fault > <endquote> fneighbour (and ffitch and fkitsch - they also have this bug) should definitely support multiple input matrices, as the original Phylip routines do. It is a very desirable trait because it is needed to create bootstrap values for trees built from distance matrix data. The desired behaviour is for fneighbor (and ffitch and fkitsch) to accept input files containing multiple distance matrices and produce multiple trees from them, in standard nested-parenthesis notation, which can then be read by fconsense. The reading should not stop at the end of the first distance matrix, or the fault will become silent, and the user familiar with Phylip may not notice that the extra matrices have been dropped until many processing steps later. I'll describe why it should work that way in a little more detail by describing the way in which I've used the functionality. The first step in making a tree with bootstrap values is to create multiple pseudo-sequences assembled from random samples (with replacement) of the genetic sequences you want to make into a tree. By default, both Seqboot (Phylip) and fseqboot (EMBASSY) give one hundred pseudo-sequences. The next step is to make one hundred slightly different trees Some methods build trees directly from the sequence data. The methods implemented by Neighbor, Fitch, and Kitsch all build trees from distance matrices. So first you have to make the hundred distance matrices. The distance matrices are calculated from the sequence data using DNAdist. In EMBASSY, fdnadist calculates one hundred distance matrices from the hundred pseudo-sequence datasets faultlessly. Now comes the problem. In Phylip you can feed the hundred-distance-matrices output from DNAdist directly into Neighbor (or Fitch or Kitsch), and build your one hundred trees in one command. EMBASSY currently will only build one at a time; this is inconvenient. The last step feeds the file containing 100 trees into Consense. Consense to labels each possible subtree (group all on one branch) with the number (percentage) of subsamples which include it. You now have bootstrap values ready to tag onto a tree (which is calculated separately from /all/ of the sequence data). I'm afraid I don't know of anyone else using EMBOSS Phylip, but if I can get it to work I'll pass my script along with my recommendation. I find it easier to script than Phylip. Please e-mail me with any questions, or for specific Phylip/EMBASSY scripts. I have some knowledge of C++, and I'm willing to help with the coding; but I warn that I'm new to development. Regards, Hazel Jenkins <[EMAIL PROTECTED]> _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
