Hello *,

the cookbook uses in its examples the SeqIOTools-class for reading the files. But in the API it is marked as deprecated. Now I am looking for alternatives, so I searched the list and internet and found out that biojavax provides methods and classes for reading the files (RichSequence.IOTools).

For example, I try to read an EMBL-file:

--begin:code--

BufferedReader br = new BufferedReader(new FileReader(filename));
Namespace ns = RichObjectFactory.getDefaultNamespace();
RichSequenceIterator seqs = RichSequence.IOTools.readEMBLDNA(br, ns);

while (seqs.hasNext()) {
   RichSequence seq = seqs.nextRichSequence();
   System.out.println(seq.getName() + ":" + seq.getAnnotation().asMap());
}

--end:code--

But I always get this error message:

--begin:error--

org.biojava.bio.BioException: Could not read sequence
at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
       at ReadGenbankFile.EMBL(ReadGenbankFile.java:42)
       at ReadGenbankFile.main(ReadGenbankFile.java:85)
Caused by: org.biojava.bio.seq.io.ParseException:

A Exception Has Occurred During Parsing.
Please submit the details that follow to [email protected] or post a bug report to http://bugzilla.open-bio.org/

Format_object=org.biojavax.bio.seq.io.EMBLFormat
Accession=null
Id=not set
Comments=
Parse_block=ID AJ243265_2; parent: AJ243265AC AJ243265;FT CDS join(<1082..1272,2484..2638,4926..>5041)
               /codon_start=3
               /gene="PGM1"
               /product="phosphoglucomutase 1"
               /function="carbohydrate metabolism"
               /EC_number="5.4.2.2"
               /db_xref="GOA:Q9H1D2"
               /db_xref="HGNC:8905"
               /db_xref="HSSP:3PMG"
               /db_xref="InterPro:IPR016055"
               /db_xref="UniProtKB/TrEMBL:Q9H1D2"
               /protein_id="CAC19809.1"
               /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV
               ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"SQ Sequence 462 BP;
Stack trace follows ....


at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:775) at org.biojavax.bio.seq.io.EMBLFormat.readRichSequence(EMBLFormat.java:284) at org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
       ... 2 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: -3
       at java.lang.String.substring(String.java:1949)
       at java.lang.String.substring(String.java:1916)
at org.biojavax.bio.seq.io.EMBLFormat.readSection(EMBLFormat.java:761)
       ... 4 more

--end:error--

The file looks all ok I think and works well with the deprecated SeqIOTools:

--begin:embl-file--
ID   AJ243265_2; parent: AJ243265
AC   AJ243265;
FT   CDS             join(<1082..1272,2484..2638,4926..>5041)
FT                   /codon_start=3
FT                   /gene="PGM1"
FT                   /product="phosphoglucomutase 1"
FT                   /function="carbohydrate metabolism"
FT                   /EC_number="5.4.2.2"
FT                   /db_xref="GOA:Q9H1D2"
FT                   /db_xref="HGNC:8905"
FT                   /db_xref="HSSP:3PMG"
FT                   /db_xref="InterPro:IPR016055"
FT                   /db_xref="UniProtKB/TrEMBL:Q9H1D2"
FT                   /protein_id="CAC19809.1"
FT /translation="VGPYVKKILCEELGAPANSAVNCVPLEDFGGHHPDPNLTYAADLV FT ETMKSGEHDFGAAFDGDGDRNMILGKHGFFVNPSDSVAVIAANTFSIPYFQQTGVRGFA
FT                   RSMPTSGALDRVASATKIALYETPTGWKFFGNLMDASKLSLCGEESFGT"
SQ   Sequence   462 BP;
ttgtgggacc gtatgtaaag aagatcctct gtgaagaact cggtgcccct gcgaactcgg 60 cagttaactg cgttcctctg gaggactttg gaggccacca ccctgacccc aacctcacct 120 atgcagctga cctggtggag accatgaagt caggagagca tgattttggg gctgcctttg 180 atggagatgg ggatcgaaac atgattctgg gcaagcatgg gttctttgtg aacccttcag 240 actctgtggc tgtcattgct gccaacacct tcagcattcc gtatttccag cagactgggg 300 tccgcggttt tgcacggagc atgcccacga gtggtgctct ggaccgggtg gctagtgcta 360 caaagattgc tttgtatgag accccaactg gctggaagtt ttttgggaat ttgatggacg 420 cgagcaaact gtccctttgt ggggaggaga gcttcgggac cg 462
//
--end:embl-file--

The parser always crashes before reading the sequence (ttgt..., directly after the BP;).

Any suggestions how I get this work?
Or are there other alternatives for substituting the deprecated SeqIOTools-class?

Thanks in advance,

with best regards,

Oliver
_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to