Peter Cock or biopython wrote: > Hi Peter R. et al, > > I gather EMBOSS is looking for feedback for new applications (given > the recent funding from the BBSRC - congratulations again). How about > suggestions for extensions to existing EMBOSS applications? > > I've used bits of EMBOSS for several years now (thank you!). Something > I have sometimes wanted to do is a many-to-many pairwise sequence > alignment with the EMBOSS tools needle and water. > > Right now, needle and water take two files (here referred to as A and > B), file A has just one sequence, and file B can have one or more > sequences. I'd like to be able to supply two files both with multiple > entries, and have needle/water do pairwise alignments between all the > sequences in A against all the sequences in B. This might be useful > for finding reciprocal best hits in comparative genomics (as an slower > but exact alternative to FASTA or BLAST).
The application is easy to add (after the release) The usual problem with all-against-all is that it involves loading one of the inputs as a sequence set entirely in memory - to avoid reading one input many times over. We have an application supermatcher which does this - the first sequence is streamed through, the second is a sequence set loaded into memory. It uses work matching to find seed alignments then runs a limited alignment around the hits. superwater would be a possible name (or superneedle). How popular would such a program be? How large would the smaller input set be? regards, Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
