Peter C. wrote: > However, consider the codon TRR. R means A or G, so this can mean TAA, > TGA, TAG or TGG which translate to stop or W (both EMBOSS and the NCBI > standard table agree here). Therefore the translation of TRR should be > "* or W", which I would expect based on the above examples to result > in "X". But instead EMBOSS transeq gives "*":
This is a side effect of the way backtranslation works. EMBOSS calculates the "most ambiguous codon" for each amino acid and stop, and uses this for back translation. Thus a '*' in a protein sequence would be rendered as 'TRR' by backtranseq. To provide consistent translation of the backtranseq results, TRR is assumed to be a backtranslated stop. Similarly, MGN is 'R' because it could reasonably result from a backtranslation of 'R' I agree that it would also be reasonable to be strict about translation in transeq and render TRR as 'X' It depends on your philosophy of where the ambiguity codes came from - from backtranslation, or the curious mind of a bioinformatician :-) So .... it's not a bug, it's a feature ... which means I can relax for now and contemplate some extras in the next release. In future, we will at least make sure TRA and other 'unambiguous ambiguous codons' get understood as '*' etc. TRR I would prefer to leave as it is by default, with option for rendering it as 'X' or an alternative to transeq with the strict translation rules enforced. regards, Peter Rice _______________________________________________ EMBOSS mailing list [email protected] http://lists.open-bio.org/mailman/listinfo/emboss
