On 13/10/2011 08:45, Jon Ison wrote:
Hi chaps (Aengus !)If I understood Aengus' msg. what's needed is something that simply combines overlapping hits (for a given pattern) into one or more non-overlapping "region of hits", and reports those regions e.g. Start End Strand Pattern_name Mismatch Sequence 54 65 + pattern1 5 GCCAAATAAGGG 104 115 + pattern1 5 CCTAAATAAGGG 179 188 + pattern1 2 CCTTGCTTGG 190 200 + pattern1 6 CCGATTAGAGC Mismatch in this case is reporting the sum of mismatches from before. A column for number of (sub)matches would also be needed. Is that right Aengus?
I'm not sure that adding the mismatches is sound. I'd assume just a best hit from the overlapping matches.
The above might give a useful result depending in the input pattern. It would I think be easy enough to implement.
This is a report output, so post-processing could be done by trimming the results before output using an associated qualifier.
Still not sure how useful it would be, we need more feedback from other users on this one please!
Peter Rice EMBOSS Team _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
