Hi again, Now that I have installed the latest and greatest version, EMBOSS 6.3.1, I'm revisiting some old issues I had with EMBOSS. In this case 'unambiguous ambiguous codons' and other translation issues.
On Fri, Jul 10, 2009 at 10:14 AM, Peter C. wrote: > On Thu, Jul 9, 2009 at 10:08 AM, Peter Rice wrote: >> >> Peter C. wrote: >>> However, consider the codon TRR. R means A or G, so this can mean TAA, >>> TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI >>> standard table agree here). Therefore the translation of TRR should be >>> "* or W", which I would expect based on the above examples to result >>> in "X". But instead EMBOSS transeq gives "*": >> >> This is a side effect of the way backtranslation works... > > OK, leaving TRR aside for the moment (I'm not sure I'd have done it that > way, but I think I follow your logic), I have some more problem cases for > you to consider (all using the default standard NCBI table 1). > > Most of these are 'unambiguous ambiguous codons' as you put it, and > I would agree using X when a more specific letter is possible isn't ideal > but isn't actually wrong. The "ATS" and related codons (see below) > however are simply wrong. > > -------------------------------------------------------------------------------------- > > TRA means TAA or TGA, which are both stop codons. Therefore TRA > should translate as a stop, not as an X: > > $ transeq asis:TAATGATRA -stdout -auto -osformat raw > **X Same on EMBOSS 6.3.1, shouldn't TRA translate as stop? > -------------------------------------------------------------------------------------- > > Now look at YTA, which means CTA or TTA which encode L, so > YTA should be L not X: > > $ transeq asis:CTATTAYTA -stdout -auto -osformat raw > LLX Same on EMBOSS 6.3.1, giving X instead of specific amino acid (i.e. YTA is an "unambiguous ambiguous codon" for L) > Likewise for YTG and YTR, and YTN. I haven't re-checked these. > -------------------------------------------------------------------------------------- > > Another example, ATW means ATA or ATT, which both translate as I, > so ATW should translate as I not X: > > $ transeq asis:ATAATTATW -stdout -auto -osformat raw > IIX Same on EMBOSS 6.3.1, giving X instead of specific amino acid (i.e. ATW is an "unambiguous ambiguous codon" for I) > -------------------------------------------------------------------------------------- > > Conversely, ATS which means ATC or ATG which translate as I and M. > Remember S means G or C. Therefore ATS should translate as X, and > not I: > > $ transeq asis:ATCATGATS -stdout -auto -osformat raw > IMI Same on EMBOSS 6.3.1, giving potentially wrong amino acid instead of X. > Likewise H means A, G or C, so ATH shows the same bug, as do some > other AT* codons: > > $ transeq asis:ATAATCATGATH -stdout -auto -osformat raw > IIMI > > [*** This one strikes me as a clear bug ***] Same on EMBOSS 6.3.1, giving potentially wrong amino acid instead of X. As I noted before, this list is only partial, and only for the standard table. I could compile a much longer list of oddities using the Biopython translation as a reference if you wanted. Regards, Peter C. _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
