Re: [Biojava-l] GenBank XML File Parse Error

Toralf Kirsten Fri, 23 Jan 2004 20:16:27 -0800

Thomas, thanks for the your answer. ASCII plain text or normal flat file as you said is downloadable from the NCBI web page. So there is no problem to use it. But we would like to use XML file, due to each term is accessible at atomic level. Thanks again. Toralf

Thomas Down wrote:

Once upon a time, Toralf Kirsten wrote:

Hi,
I have to extract data from the GenBank XML files.
For this purpose I use the biojava API. But I get a parser error.

java.lang.StringIndexOutOfBoundsException: String index out of range: 12
at java.lang.String.substring(String.java:1477)
at org.biojava.bio.seq.io.GenbankContext.processHeaderLine
(GenbankContext.java:621)
[snip]


The program is just simple. The user specifies path and file name by the
FileChooser component. Then I open the file and apply the Sequence and
Annotation classes as visible in the attached method taken from a extended
file class.

What I need are the sequence data of the GenBank entry (accession, sequence etc.) and also for its features (start, end position, subtype like t-RNA, cds etc.)


I'm afraid that BioJava doesn't currently support the XML version
of genbank records.  The Genbank parser you are using expects the
normal flatfile version of the genbank records -- do you have
access to these?

We should probably look at adding Genbank XML support to BioJava.
Does anyone know how widely it's used (I must admit I haven't met
it before).

Thomas.

_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Re: [Biojava-l] GenBank XML File Parse Error

Reply via email to