Hi Scott,
Sorry for the delay in response. It was a little more complicated than
expected and I hope the fix does not break any code around. I added a
test for GenbankRichSequenceDB. Everything should be committed by now
and you can check out the result on the BioJava server. Let me know if
you get any problems or unexpected issues.
Thank you,
George
Quoting Scott Frees <[email protected]>:
George -
That looks right to me. I don't have a developer account, so it would
be great if you could check that in.
Thanks!
Scott
On Fri, Feb 17, 2012 at 12:56 PM, George Waldon
<[email protected]> wrote:
Hi Scott,
Yes, well done. You need to fix rettype too. So, if I have it correct, we
should uncomment and have:
rettype = "gb"
retmode = "txt"
and existing code should not be broken. What do you think? I can commit if
you do not have a developer account.
Thanks,
- George
Quoting Scott Frees <[email protected]>:
George - Thanks for your response.
I think I tracked down the problem. When building the FetchURL,
GenbankRichSequenceDB uses "genbank" as the db. In the
org.biojava.bio.seq.db.FetchURL constructor, rettype and retmode are
specifically not set when given "genbank" - see lines 54-55 commented
out.
//
rettype = format;
//
retmode = format;
Entrez recently updated their API
(http://www.ncbi.nlm.nih.gov/books/NBK25501/) on Wednesday and in the
release notes they say they've set defaults on each database for
retmode. I'm new to biojava and entrez, but I can only assume that
the "genbank" db used to return sequences as text always, which is why
FetchURL doesn't include the parameter in the URL it builds. It looks
like the default now is XML - which breaks the GenbankRichSequenceDB
parser.
I proved it out by subclassing GenbankRichSequenceDB to set the
retmode parameter as text, and the problem is resolved.
@Override
protected URL getAddress(String id) throws MalformedURLException {
FetchURL seqURL = new
FetchURL("Genbank", "text");
String baseurl = seqURL.getbaseURL();
String db = seqURL.getDB();
// added retmode=text
String url =
baseurl+db+"&id="+id+"&rettype=gb&retmode=text&tool="+getTool()+"&email="+getEmail();
return new URL(url);
}
I think a more elegant solution would be to simply fix FetchURL to use
the retmode parameter
Regards -
Scott
On Thu, Feb 16, 2012 at 8:53 PM, George Waldon <[email protected]>
wrote:
Hello Scott,
This appears to be an exception thrown by the parser. Is-there a way you
can
fetch the sequence(s) as a text file before the exception occurs? It
would
be interesting to see if you can reproduce the exception; you can send me
the file if you want.
Regards,
George
Quoting Scott Frees <[email protected]>:
Hello -
I have developed an application that searches and compares
g-quadruplexes within mRNA. The web application has been running
without any problems on several different web servers for over a year.
Suddenly, just this week, it is unable to download sequence data
using GenbankRichSequenceDB - has anyone else has had this problem?
We are using BioJava 1.8.1
Below is the exception trace, and the code that follows is a small
test app that generates the exception. This code worked without
any
problems prior to Tuesday this week, and we haven't made any
modification to our application.
------------------------------------------------------
org.biojava.bio.BioException: Failed to read Genbank sequence
at
org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:163)
at Tester.main(Tester.java:11)
Caused by: org.biojava.bio.BioException: Could not read sequence
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:113)
at
org.biojavax.bio.db.ncbi.GenbankRichSequenceDB.getRichSequence(GenbankRichSequenceDB.java:159)
... 1 more
Caused by: org.biojava.bio.seq.io.ParseException:
A Exception Has Occurred During Parsing.
Please submit the details that follow to [email protected] or post
a bug report to http://bugzilla.open-bio.org/
Format_object=org.biojavax.bio.seq.io.GenbankFormat
Accession=null
Id=null
Comments=Bad section
Parse_block=<?xml version="1.0"?>
Stack trace follows ....
at
org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:620)
at
org.biojavax.bio.seq.io.GenbankFormat.readRichSequence(GenbankFormat.java:279)
at
org.biojavax.bio.seq.io.RichStreamReader.nextRichSequence(RichStreamReader.java:110)
... 2 more
Caused by: java.lang.StringIndexOutOfBoundsException: String index out
of range: -4
at java.lang.String.substring(Unknown Source)
at java.lang.String.substring(Unknown Source)
at
org.biojavax.bio.seq.io.GenbankFormat.readSection(GenbankFormat.java:610)
... 4 more
-----------------------------
import org.biojava.bio.BioException;
import org.biojava.bio.seq.db.IllegalIDException;
import org.biojavax.bio.db.ncbi.GenbankRichSequenceDB;
import org.biojavax.bio.seq.RichSequence;
public class Tester {
public static void main(String args[]) {
String id =
"NM_001110.2"; // Issue occurs with any ID
GenbankRichSequenceDB ncbi = new GenbankRichSequenceDB();
try {
RichSequence rs = ncbi.getRichSequence(id);
System.out.println(rs.seqString());
} catch
(IllegalIDException e) {
e.printStackTrace();
} catch
(BioException e) {
e.printStackTrace();
}
}
}
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l
--------------------------------
George Waldon
--------------------------------
George Waldon
--------------------------------
George Waldon
_______________________________________________
Biojava-l mailing list - [email protected]
http://lists.open-bio.org/mailman/listinfo/biojava-l