On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice<[email protected]> wrote: > > Peter C. wrote: >> However, consider the codon TRR. R means A or G, so this can mean TAA, >> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI >> standard table agree here). Therefore the translation of TRR should be >> "* or W", which I would expect based on the above examples to result >> in "X". But instead EMBOSS transeq gives "*": > > This is a side effect of the way backtranslation works...
OK, leaving TRR aside for the moment (I'm not sure I'd have done it that way, but I think I follow your logic), I have some more problem cases for you to consider (all using the default standard NCBI table 1). Most of these are 'unambiguous ambiguous codons' as you put it, and I would agree using X when a more specific letter is possible isn't ideal but isn't actually wrong. The "ATS" and related codons (see below) however are simply wrong. -------------------------------------------------------------------------------------- TRA means TAA or TGA, which are both stop codons. Therefore TRA should translate as a stop, not as an X: $ transeq asis:TAATGATRA -stdout -auto -osformat raw **X -------------------------------------------------------------------------------------- Now look at YTA, which means CTA or TTA which encode L, so YTA should be L not X: $ transeq asis:CTATTAYTA -stdout -auto -osformat raw LLX Likewise for YTG and YTR, and YTN. -------------------------------------------------------------------------------------- Another example, ATW means ATA or ATT, which both translate as I, so ATW should translate as I not X: $ transeq asis:ATAATTATW -stdout -auto -osformat raw IIX -------------------------------------------------------------------------------------- Conversely, ATS which means ATC or ATG which translate as I and M. Remember S means G or C. Therefore ATS should translate as X, and not I: $ transeq asis:ATCATGATS -stdout -auto -osformat raw IMI Likewise H means A, G or C, so ATH shows the same bug, as do some other AT* codons: $ transeq asis:ATAATCATGATH -stdout -auto -osformat raw IIMI [*** This one strikes me as a clear bug ***] -------------------------------------------------------------------------------------- Now for another debatable one, RAT means AAT or GAT which code for N and D. So, you could use B (Asx) here rather than the broader X. $ transeq asis:AATGATRAT -stdout -auto -osformat raw NDX Again, the same thing for others like RAC -> X not B, and RAY -> X not B. Similarly, you don't use J to mean leucine (L) or to isoleucine (I), and opt for X (again, this is justifiable). e.g. WTA $ transeq asis:ATATTAWTA -stdout -auto -osformat raw ILX -------------------------------------------------------------------------------------- This list is only partial, and only for the standard table. Peter C. _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
