Hi Peter R. et al, I gather EMBOSS is looking for feedback for new applications (given the recent funding from the BBSRC - congratulations again). How about suggestions for extensions to existing EMBOSS applications?
I've used bits of EMBOSS for several years now (thank you!). Something I have sometimes wanted to do is a many-to-many pairwise sequence alignment with the EMBOSS tools needle and water. Right now, needle and water take two files (here referred to as A and B), file A has just one sequence, and file B can have one or more sequences. I'd like to be able to supply two files both with multiple entries, and have needle/water do pairwise alignments between all the sequences in A against all the sequences in B. This might be useful for finding reciprocal best hits in comparative genomics (as an slower but exact alternative to FASTA or BLAST). >From an implementation point of view, I might imagine doing sequence A1 against all of B, then sequence A2 against all of B, etc. This would require looping over file B many times (easy if on disk). This would also work if the A input was stdin, but having the B input on stdin would require caching the data if A has more than one sequence :( It may sometimes also be useful to have an all-against-all pairwise comparison for a single set of sequences. The above suggested enhancement would let you do this by comparing file A to file A. However, here you only really need to do half the possible combinations (as aligning sequence A1 to sequence A2 should be the same as A2 to A1). This could be useful for implementing a basic clustering algorithm, or maybe as part of a worked example in building a simple NJ tree? So, does supporting many-to-many comparisons sound like a useful enhancement to needle and water? I should stress this isn't something I need right now. Also, it can be worked around with a wrapper script to call needle/water once for each sequence in file A (against all the sequences in file B), with the added bonus that then these jobs one-to-many comparisons can then be shared across multiple CPU cores. Regards, Peter C. _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
