Thomas,
thanks for the your answer.
ASCII plain text or normal flat file as you said is downloadable from the NCBI web page.
So there is no problem to use it. But we would like to use XML file, due to each term is accessible at atomic level.
Thanks again.
Toralf


Thomas Down wrote:

Once upon a time, Toralf Kirsten wrote:


Hi,
I have to extract data from the GenBank XML files.
For this purpose I use the biojava API. But I get a parser error.

java.lang.StringIndexOutOfBoundsException: String index out of range: 12
at java.lang.String.substring(String.java:1477)
at org.biojava.bio.seq.io.GenbankContext.processHeaderLine
(GenbankContext.java:621)
[snip]


The program is just simple. The user specifies path and file name by the FileChooser component. Then I open the file and apply the Sequence and Annotation classes as visible in the attached method taken from a extended file class.

What I need are the sequence data of the GenBank entry (accession,
sequence etc.)
and also for its features (start, end position, subtype like t-RNA, cds
etc.)



I'm afraid that BioJava doesn't currently support the XML version of genbank records. The Genbank parser you are using expects the normal flatfile version of the genbank records -- do you have access to these?

We should probably look at adding Genbank XML support to BioJava.
Does anyone know how widely it's used (I must admit I haven't met
it before).

Thomas.


_______________________________________________
Biojava-l mailing list  -  [EMAIL PROTECTED]
http://biojava.org/mailman/listinfo/biojava-l

Reply via email to