Hi Morgane - I have to say that doesn't look much like Genbank : )
The biojavax parser are possibly a bit brittle due to their use of regexps to recognize key elements. It should be fixable, I think the problem is that the parser expects a word after LOCUS not a number. This may not be the only problem though. Could you post the entire file? Or if it is large then a representative file of smaller size. - Mark Morgane THOMAS-CHOLLIER <[EMAIL PROTECTED]> Sent by: [EMAIL PROTECTED] 02/14/2006 04:36 AM To: biojava-l@biojava.org cc: (bcc: Mark Schreiber/GP/Novartis) Subject: [Biojava-l] Genbank parser error [biojavax] Hello, I have tried biojavax today with a view to use the Genbank file parser. My test file is a Genbank formatted file which has been produced by Ensembl export system. The head of the file is as follow : LOCUS 6 489671 bp DNA HTG 13-FEB-2006 DEFINITION Mus musculus chromosome 6 NCBIM34 partial sequence 52296503..52786173 reannotated via EnsEMBL ACCESSION chromosome:NCBIM34:6:52296503:52786173:1 VERSION chromosome:NCBIM34:6:52296503:52786173:1 I used the code provided in biojavax docbook to parse this file. I get the following error : Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 6 489671 bp DNA HTG 13-FEB-2006 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108) ... 1 more I had a look at GenbankFormat.java, and I guess the problem comes from the regular expression that do not recognize the LOCUS as a standard Genbank file LOCUS tag. Am I wrong ? Have biojavax Genbank parser been tested on Ensembl exported files ? Morgane. -- ************************************* Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l _______________________________________________ Biojava-l mailing list - Biojava-l@biojava.org http://biojava.org/mailman/listinfo/biojava-l