I have come across a problem with genbank files using the perl module Bio::DB::GenBank. When I get the genbank sequence from NCBI and write the sequence out to in genbank format the Locus line is missing the date.
LOCUS AC104722 24949 bp DNA linear BCT instead of LOCUS AC104722 24949 bp DNA linear BCT 21-DEC-2001 which is what I get when I download the file myself. I don't know if this represents a problem in reading the reading the file or writing the file. Why am I cross-posting this to biojava???. Well the biojava parser dies on such a file with a message that says that the Locus line is too short. Is the date a required element in the Locus line? Is there consensus on what constitutes correct format? Has it changed recently? David I also noticed that the biojava parser is very picky about the number of spaces; delete a few spaces between DNA and linear and it dies too. Exception in thread "main" org.biojava.bio.seq.io.ParseException: LOCUS line too short [LOCUS AC104719 17453 bp DNA linear BCT 21-DE C-2001] at org.biojava.bio.seq.io.GenbankContext.parseLocusLinePost127(GenbankFo rmat.java, Compiled Code) at org.biojava.bio.seq.io.GenbankContext.processHeaderLine(GenbankFormat .java, Compiled Code) at org.biojava.bio.seq.io.GenbankContext.processLine(GenbankFormat.java, Compiled Code) at org.biojava.bio.seq.io.GenbankFormat.readSequence(GenbankFormat.java, Compiled Code) rethrown as org.biojava.bio.BioException: Could not read sequence at org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java, C ompiled Code) _______________________________________________ Biojava-l mailing list - [EMAIL PROTECTED] http://biojava.org/mailman/listinfo/biojava-l