Hi there, Does anyone know how to create a SymbolList with a String that contains illegal symbol?
I encountered IllegalSymbolException when I tried to retrieve sequences from a sequence database. The sequence that gave me the trouble was a refseq sequence, accession number NT_039621, Mus musculus chromosome 15 genomic contig. I firsted used DNATools.createDNA(String dna), and got IllegalSymbolException that indicated there was at least one 'u' in the sequence. I then used NucleotideTools.createNucleotide(String nucleotide), this time the 'u' did not cause any problem, but however I sitll got IllegalSymbolException that inidicated there was 'l' in the sequence. I am afraid there must be lots of illegal symbols in GenBank's sequences, I am wondering if there is a way to create error-tolerate SymbolList object. If not, I am afraid I have to create an Alphabet object that contains Symbols that covers all char in java and use this Alphabet object to create a CharacterTokenization using CharacterTokenization(Alphabet alpha, boolean caseSensitive) constructor, and then use the resulting CharacterTokenization object to call SimpleSymbolList(SymbolTokenization st, String seqString) to get a SimpleSymbolList object. I guess there must be a better way in Biojava to do this. Your help is highly appreciated. If I have to create an Alphatebet that covers all char in Java, how can I do it? I originally thought merge NUCLEOTIDE and PROTEIN Alphabet to create a new Alphabet would be able to cover all the Symboles in GenBank sequences, but I noticed there was no method to merge to Alphabets in AlphabetManager. Is there a way to merge two Alphabets? If not, probably it is worth to implement one. It will be useful not only to handle IllegalSymbols exist in the databases, but also other applications like using non-standard symbols to generate blastable MSBlast database. Thanks a lot for your help. Regards, Tao _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
