Dear Karen, > I am currently using needle to generate an alignment between two > sequences which contain non-informative bases (ie, identified low > quality bases (phred scores) and have been changed to "N"). > Presently, these bases are penalized as any other non-matching > character. Is there any way to change needle to "overlook" these > bases when generating the best scoring alignment (or, do I need to > write my own version of needle?)
There are two matrix files for nucleotide comparisons. The default is EDNAFULL which counts N as an average of all possible scores (1 match against 3 possible mismatches). The alternative is EDNAMAT which only scores exact matches like blastn (use -data EDNAMAT on the command line to see the difference). But you can also copy EDNAMAT to your local directory with embossdata EDNAFULL -fetch mv EDNAFULL EDNAPHRED (best to do this rename or you will accidentally be using this file by default for other needle runs in the same directory) edit EDNAPHRED to have the scores you want for N (perhaps +1 for a small match to ACGTU, +2 for a match to a 2-base code RYSWKM, +3 for a match to a 3-base code BDHV and +4 for a match to another N. Then run with: needle -data EDNAPHRED If enough users think this is a meaningful scoring system we could add such a matrix to the distribution. Let us know if it really gives you more useful scores. My natural prejudice is to trust EDNAFULL. I guess you are expecting to often find the base in the other sequence is the one phred started with, which will indeed bias the scoring. Hope this helps, Peter _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
