On Fri, Oct 05, 2001 at 08:06:23PM -0400, Cox, Greg wrote: > When converting a DNA strand to an RNA strand, RNATools has a hardcoded T -> > U and returns the symbol otherwise. This breaks if an ambiguous nucleotide > is passed in, since they don't trip the T check. I looked in the alphabet > XML file, and there are no ambiguous RNA symbols. > > The use case I'm facing is translating a DNA sequence. The translation in > BioJava goes through an RNA sequence, so ambiguous residues foul it up. > > So, I propose one of the following solutions: > > * Introduce ambiguous RNA symbols that are analogous to the DNA symbols. > > * Introduce one ambiguous RNA symbol that all ambigous DNA symbols map to. > > * Break the biological parallel and translate DNA directly to amino acids. > > If I don't hear from anyone, I'll do the third.
Can I put in a vote for option 1? I think that's what was really intended by the current design, and it seems to me to be the `least surprise' option. Should just mean adding the relevant bits to AlphabetManager.xml and fixing the DNA-> RNA translater. The current handling of ambiguity symbols is far from wonderful. I wrote a patchset a while back which handled these in a much tidier way, and also addressed issues with the current SymbolParser. I never had time to get this 100% finished, but I can send a copy to anyone who's interested, or get it checked in on a CVS branch. What's left is basically: - Sync up with current source tree (should be quite easy) - Tidy up parsing of cross-product symbols - Testing :) I'll try to get it finished off next week. Thomas. _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l