Hello,

I have tried biojavax today with a view to use the Genbank file parser.

My test file is a Genbank formatted file which has been produced by Ensembl export system.

The head of the file is as follow :

LOCUS       6 489671 bp DNA HTG 13-FEB-2006
DEFINITION  Mus musculus chromosome 6 NCBIM34 partial sequence
           52296503..52786173 reannotated via EnsEMBL
ACCESSION   chromosome:NCBIM34:6:52296503:52786173:1
VERSION     chromosome:NCBIM34:6:52296503:52786173:1

I used the code provided in biojavax docbook to parse this file.
I get the following error :

Exception in thread "main" org.biojava.bio.BioException: Could not read sequence at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:111) at org.embnet.be.biojavax.tryout.GenbankParseTest.main(GenbankParseTest.java:31) Caused by: org.biojava.bio.seq.io.ParseException: Bad locus line found: 6 489671 bp DNA HTG 13-FEB-2006 at org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:229) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:108)
   ... 1 more

I had a look at GenbankFormat.java, and I guess the problem comes from the regular expression that do not recognize the LOCUS as a standard Genbank file LOCUS tag.

Am I wrong ? Have biojavax Genbank parser been tested on Ensembl exported files ?

Morgane.

--
*************************************
Morgane THOMAS-CHOLLIER, PHD Student Vrije Universiteit Brussels (VUB) Laboratory of Cell Genetics Pleinlaan 2 1050 Brussels Belgium

_______________________________________________
Biojava-l mailing list  -  Biojava-l@biojava.org
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to