Hi Yun , you might try a clustering algorithm like blastclust (single linkage clustering) or mcl (a.k.a tribe-mcl) or one of the others that exist. I can't think of any EMBOSS apps that would solve this problem, but maybe someone else has a better answer. Mike
On Dec 7, 2006, at 2:36 PM, yun zheng wrote: > Hi, > > Are there any tools for find unique sequences from a large > database? Many > thanks. > > I need to find unique DNA sequences from a large database. A short > piece is > given as follows. > >> 001 > aaaagttgtgtgtgtatgacaggtt >> 013 > aacctgtcatacacacacaactttt >> 289 > gttgtgtgtgtatgacaggtt >> 375 > tgtgtgtatgacaggttgat >> 319 > tcaacctgtcatacacaca >> 177 > cgcagtgtgtgtatgacagg >> 271 > gtcctacctgtcatacacac >> 020 > aagacataatgtgtgtatgacag > > All these seem to be the same sequence, since BLASTN gives very small > e-values for their alignments. > > BLASTN 2.2.8 [Jan-05-2004] > > > Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. > Schaffer, > Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), > "Gapped BLAST and PSI-BLAST: a new generation of protein database > search > programs", Nucleic Acids Res. 25:3389-3402. > > Query= 001 > (25 letters) > > Database: drought-clustered.fa > 410 sequences; 8877 total letters > > Searching.done > > > Score E > Sequences producing significant alignments: > (bits) > Value > > 013 > 50 > 8e-11 > 001 > 50 > 8e-11 > 289 > 42 > 2e-08 > 375 > 34 > 5e-06 > 319 > 34 > 5e-06 > 177 > 32 > 2e-05 > 271 > 30 > 8e-05 > 020 > 28 > 3e-04 > > Best regards. > > sincerely > > Zheng, Yun > > Department of Computer Science > > Washington University in St Louis > > Campus Box 1045 > > 1 Brookings Drive, St Louis, MO 63130 > _______________________________________________ > EMBOSS mailing list > [email protected] > http://lists.open-bio.org/mailman/listinfo/emboss _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
