Hi Scott,

Sorry for the delay in response. It was a little more complicated than expected and I hope the fix does not break any code around. I added a test for GenbankRichSequenceDB. Everything should be committed by now and you can check out the result on the BioJava server. Let me know if you get any problems or unexpected issues.

Thank you,
George

Quoting Scott Frees <[email protected]>:

George -

That looks right to me.  I don't have a developer account, so it would
be great if you could check that in.

Thanks!
Scott

On Fri, Feb 17, 2012 at 12:56 PM, George Waldon
<[email protected]> wrote:
Hi Scott,

Yes, well done. You need to fix rettype too. So, if I have it correct, we
should uncomment and have:

rettype = "gb"
retmode = "txt"

and existing code should not be broken. What do you think? I can commit if
you do not have a developer account.

Thanks,
- George

Quoting Scott Frees <[email protected]>:

George - Thanks for your response.

I think I tracked down the problem. &nbsp;When building the FetchURL,
GenbankRichSequenceDB uses "genbank" as the db. &nbsp;In the
org.biojava.bio.seq.db.FetchURL constructor, rettype and retmode are
specifically not set when given "genbank" - see lines 54-55 commented
out.

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// &nbsp; &nbsp; &nbsp;rettype = format; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;// &nbsp; &nbsp; &nbsp;retmode = format;

Entrez recently updated their API
(http://www.ncbi.nlm.nih.gov/books/NBK25501/) on Wednesday and in the
release notes they say they've set defaults on each database for
retmode. &nbsp;I'm new to biojava and entrez, but I can only assume that
the "genbank" db used to return sequences as text always, which is why
FetchURL doesn't include the parameter in the URL it builds. &nbsp;It looks
like the default now is XML - which breaks the GenbankRichSequenceDB
parser.

I proved it out by subclassing GenbankRichSequenceDB to set the
retmode parameter as text, and the problem is resolved.

@Override
protected URL getAddress(String id) throws MalformedURLException {
&nbsp; &nbsp; &nbsp; &nbsp;FetchURL seqURL = new FetchURL("Genbank", "text");
&nbsp; &nbsp; &nbsp; &nbsp;String baseurl = seqURL.getbaseURL();
&nbsp; &nbsp; &nbsp; &nbsp;String db = seqURL.getDB();
&nbsp; &nbsp; &nbsp; &nbsp;// added retmode=text
&nbsp; &nbsp; &nbsp; &nbsp;String url =

baseurl+db+"&id="+id+"&rettype=gb&retmode=text&tool="+getTool()+"&email="+getEmail();
&nbsp; &nbsp; &nbsp; &nbsp;return new URL(url);
}

I think a more elegant solution would be to simply fix FetchURL to use
the retmode parameter

Regards -
Scott

On Thu, Feb 16, 2012 at 8:53 PM, George Waldon <[email protected]>
wrote:

Hello Scott,

This appears to be an exception thrown by the parser. Is-there a way you
can
fetch the sequence(s) as a text file before the exception occurs? It
would
be interesting to see if you can reproduce the exception; you can send me
the file if you want.

Regards,
George

Quoting Scott Frees <[email protected]>:

Hello -

I have developed an application that searches and compares
g-quadruplexes within mRNA. &nbsp;The web application has been running

without any problems on several different web servers for over a year.
&nbsp;Suddenly, just this week, it is unable to download sequence data

using GenbankRichSequenceDB - has anyone else has had this problem?

We are using BioJava 1.8.1

Below is the exception trace, and the code that follows is a small
test app that generates the exception. &nbsp;This code worked without
any

problems prior to Tuesday this week, and we haven't made any
modification to our application.
------------------------------------------------------
org.biojava.bio.BioException: Failed to read Genbank sequence
&nbsp; &nbsp; &nbsp; &nbsp;at

org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:163)
&nbsp; &nbsp; &nbsp; &nbsp;at Tester.main(Tester.java:11)

Caused by: org.biojava.bio.BioException: Could not read sequence
&nbsp; &nbsp; &nbsp; &nbsp;at

org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
&nbsp; &nbsp; &nbsp; &nbsp;at

org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:159)
&nbsp; &nbsp; &nbsp; &nbsp;... 1 more

Caused by: org.biojava.bio.seq.io.ParseException:

A Exception Has Occurred During Parsing.
Please submit the details that follow to [email protected] or post
a bug report to http://bugzilla.open-bio.org/

Format_object=org.biojavax.bio.seq.io.GenbankFormat
Accession=null
Id=null
Comments=Bad section
Parse_block=<?xml &nbsp; version="1.0"?>
Stack trace follows ....

&nbsp; &nbsp; &nbsp; &nbsp;at

org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:620)
&nbsp; &nbsp; &nbsp; &nbsp;at

org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:279)
&nbsp; &nbsp; &nbsp; &nbsp;at

org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
&nbsp; &nbsp; &nbsp; &nbsp;... 2 more

Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -4
&nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
&nbsp; &nbsp; &nbsp; &nbsp;at java.lang.String.substring(Unknown Source)
&nbsp; &nbsp; &nbsp; &nbsp;at

org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:610)
&nbsp; &nbsp; &nbsp; &nbsp;... 4 more

-----------------------------


import org.biojava.bio.BioException;
import org.biojava.bio.seq.db.IllegalIDException;
import org.biojavax.bio.db.ncbi.GenbankRichSequenceDB;
import org.biojavax.bio.seq.RichSequence;

public class Tester {
&nbsp; &nbsp; &nbsp; &nbsp;public static void main(String args[]) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;String id =
"NM_001110.2"; &nbsp;// Issue occurs with any ID
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp;GenbankRichSequenceDB &nbsp;ncbi = new GenbankRichSequenceDB();
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;try {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp;RichSequence rs = ncbi.getRichSequence(id);
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp;System.out.println(rs.seqString());
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch
(IllegalIDException e) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp;e.printStackTrace();
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;} catch
(BioException e) {
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;
&nbsp; &nbsp;e.printStackTrace();
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;}
&nbsp; &nbsp; &nbsp; &nbsp;}
}

_______________________________________________
Biojava-l mailing list &nbsp;- &nbsp;[email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l




--------------------------------
George Waldon






--------------------------------
George Waldon






--------------------------------
George Waldon


_______________________________________________
Biojava-l mailing list  -  [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l

Reply via email to