Hi Tao, Am I right you want to read in genbank data? You might want to take a look at this particular page of biojava in anger: http://www.biojava.org/docs/bj_in_anger/ReadingGES.htm
This page describes how to read in sequence data from genbank. I hope this helps. Regards Frans > -----Original Message----- > From: [EMAIL PROTECTED] [mailto:biojava-l- > [EMAIL PROTECTED] On Behalf Of Tao Xu > Sent: Tuesday, December 09, 2003 10:02 AM > To: [EMAIL PROTECTED] > Subject: [Biojava-l] How to create a SymbolList with a String that > containsillegal Char > > Hi there, > > Does anyone know how to create a SymbolList with a String that > contains illegal symbol? > > I encountered IllegalSymbolException when I tried to retrieve > sequences from a sequence database. The sequence that gave me the > trouble was a refseq sequence, accession number NT_039621, Mus > musculus chromosome 15 genomic contig. I firsted used > DNATools.createDNA(String dna), and got IllegalSymbolException that > indicated there was at least one 'u' in the sequence. I then used > NucleotideTools.createNucleotide(String nucleotide), this time the 'u' > did not cause any problem, but however I sitll got > IllegalSymbolException that inidicated there was 'l' in the sequence. > > I am afraid there must be lots of illegal symbols in GenBank's > sequences, I am wondering if there is a way to create error-tolerate > SymbolList object. If not, I am afraid I have to create an Alphabet > object that contains Symbols that covers all char in java and use this > Alphabet object to create a CharacterTokenization using > CharacterTokenization(Alphabet alpha, boolean caseSensitive) > constructor, and then use the resulting CharacterTokenization object > to call SimpleSymbolList(SymbolTokenization st, String seqString) to > get a SimpleSymbolList object. I guess there must be a better way in > Biojava to do this. Your help is highly appreciated. > > If I have to create an Alphatebet that covers all char in Java, how > can I do it? I originally thought merge NUCLEOTIDE and PROTEIN > Alphabet to create a new Alphabet would be able to cover all the > Symboles in GenBank sequences, but I noticed there was no method to > merge to Alphabets in AlphabetManager. Is there a way to merge two > Alphabets? If not, probably it is worth to implement one. It will be > useful not only to handle IllegalSymbols exist in the databases, but > also other applications like using non-standard symbols to generate > blastable MSBlast database. > > Thanks a lot for your help. > > Regards, > > Tao > > > > _______________________________________________ > Biojava-l mailing list - [EMAIL PROTECTED] > http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
