Andrey, I compared the latest CVS checkout with one checked out a few days ago. Some changes has been made to fixed the ArrayIndexOutOfBoundsException problem:
--------------------------- 700c700,701 < for (int i = 0; i < TAG_LENGTH; i++) --- > int len = Math.min(l.length, TAG_LENGTH); // handles empty lines better > for (int i = 0; i < len; i++) ---------------------------- Also in the new GenbankFormat.java, the ArrayIndexOutOfBoundsException is caught. So you should not see it thrown in the output. Indeed, my test also seems to be successful: ----------------------------------- $java -Xmx100M TestReadingGenBankFiles Loading sequence... Loaded... Getting seq... Got chr_X ** I have to increase the max heap size to something around 100M or larger (setting it to 80M still throws java.lang.OutOfMemoryError). -------------------------------- So I would suggest you to try the latest biojava build or CVS checkout. Regards, Chun-Nuan Andrey Zinovyev wrote: >Hi! > >What's wrong with this code: > >I've got a sequence in GenBank format from >ftp://ncbi.nlm.nih.gov/genbank/genomes/C_elegans/CHR_I/worm_X.gbk > >and tried to parse it with this code: >----------------------- >import org.biojava.bio.*; >import org.biojava.bio.symbol.*; >import org.biojava.bio.seq.*; >import org.biojava.bio.seq.io.*; >import java.io.*; >import java.util.*; > >public class TestReadingGenBankFiles { > public TestReadingGenBankFiles() { > } > public static void main(String[] args) { > try{ > File GenBankFile = new File("worm_X.gbk"); > System.out.println("Loading sequence..."); > BufferedReader eReader = new BufferedReader( > new InputStreamReader(new FileInputStream(GenBankFile))); > SequenceIterator seqI = SeqIOTools.readGenbank(eReader); > System.out.println("Loaded..."); > System.out.println("Getting seq..."); > Sequence seq = seqI.nextSequence(); > System.out.println("Got "+seq.getName()); > }catch(Throwable t){ t.printStackTrace();}; > } >} >--------------------------- > >Though this code worked on many sequencies, here I have > >Loading sequence... >Loaded... >Getting seq... >java.lang.ArrayIndexOutOfBoundsException > at >org.biojava.bio.seq.io.GenbankContext.hasHeaderTag(GenbankFormat.java:685) > at >org.biojava.bio.seq.io.GenbankContext.processHeaderLine(GenbankFormat.java:5 >44) > at >org.biojava.bio.seq.io.GenbankContext.processFeatureLine(GenbankFormat.java: >497) > at >org.biojava.bio.seq.io.GenbankContext.processLine(GenbankFormat.java:364) > at >org.biojava.bio.seq.io.GenbankFormat.readSequence(GenbankFormat.java:137) > at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:100) >rethrown as org.biojava.bio.BioException: Could not read sequence > at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java:103) > at caijava.TestReadingGenBankFiles.main(TestReadingGenBankFiles.java:35) >------------------------------------- > >What's wrong? Is the parser could be applied to such long sequencies or >there are limitations? > >Thanks, >Andrey Zinovyev. > >_______________________________________________ >Biojava-l mailing list - [EMAIL PROTECTED] >http://biojava.org/mailman/listinfo/biojava-l > > _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l
