Jitesh - I forwarded your response to the list so that everyone can get the chance to reply.
cheers, Richard Begin forwarded message: > From: jitesh dundas <[email protected]> > Date: 24 November 2009 14:47:00 GMT > To: Richard Holland <[email protected]> > Subject: Re: [Biojava-l] Java Error:- XML Parsing Error: XML or text > declaration not at start of entity > > Dear Sir, > > Thank you for your reply. I figured this problem out by sending records in > small sets. e.g. 20 pages per page. > > It is like a pagination functionality. For each new page, we need to hit the > URl.. > > My functionality is working fine.I will be happy to share my code with you > (and anyone) who needs it. > > I simply fetch data from the URL and write to an XML file. Next I just read > the XML file and show them in the web page to the user. > > Again, I need to know how to fetch records for protein database. Two types of > searches are needed I suspect. > > First we use the Esearch utility and then the Efetch utility to get the data > of the specific protein.. > > I welcome any suggestions on this ! > > Thank you everyone for your help. > > Regards, > Jitesh Dundas > > On 11/24/09, Richard Holland <[email protected]> wrote: > Your program takes an input 'txtURLString' - could you give an example of the > value that this usually contains? I suspect that this URL is where your > problem lies but without seeing an example value I couldn't say for sure. > > thanks, > Richard > > On 8 Nov 2009, at 10:22, jitesh dundas wrote: > > > Dear Sir, > > > > My program is working fine and can send me an xml file with 20 > > records. However, it does not allow me to send large amounts of > > records. > > > > For e.g. if I enter "cancer" it will return only 20 records. > > > > Can you please tell me what I should do next to get all those records. > > Thank you in advance > > > > Regards, > > Jitesh Dundas > > > > On Sun, Nov 1, 2009 at 9:36 PM, Andreas Prlic <[email protected]> wrote: > >> > >> Hi Jitesh, > >> > >> It is hard to read your code with all the formatting off probably due to > >> email and many commented lines that don;t seem to get used. Can you > >> provide the stacktrace, so we can see what part of biojava is affected? > >> > >> Probably a good strategy to write and debug this is to simply the problem > >> into smaller steps. Try to first download the files you want to parse and > >> write the code to parse them from the local file. That will avoid any > >> issues you might encounter with networking and server/client > >> communication. Once the parsing is working you could take it to the next > >> step and add the server communication... > >> > >> Andreas > >> > >> > >> > >> > >> On Sun, Nov 1, 2009 at 7:41 AM, jitesh dundas <[email protected]> wrote: > >>> > >>> Hi friends, > >>> > >>> I am getting this error on doing a post(using the code below) to this > >>> url-> > >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pubmed&term=cancer&reldate=10 > >>> > >>> I have written this code in .jsp file. Later I will change it into > >>> servlet. > >>> > >>> Error:- > >>> XML Parsing Error: XML or text declaration not at start of entity > >>> Location: > >>> http://localhost:8080/ProteomDb/ImportFromPubmed2.jsp?txtDbName=pubmed&txtTerm=cancer&txtreldate=10&comSDay=01&comSMonth=01&txtSYear=&comEDay=01&comEMonth=01&txtEYear=&txtURLString=http%3A%2F%2Feutils.ncbi.nlm.nih.gov%2Fentrez%2Feutils%2Fesearch.fcgi%3Fdb%3Dpubmed%26term%3Dcancer%26reldate%3D10&txtsubmit=Fetch+Data+From+NCBI > >>> Line Number 11, Column 1:<?xml version="1.0" ?><!DOCTYPE eSearchResult > >>> PUBLIC "-//NLM//DTD eSearchResult, 11 May 2002//EN" " > >>> http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSearch_020511.dtd"><eSearchResult><Count>2034</Count><RetMax>20</RetMax><RetStart>0</RetStart><IdList> > >>> <Id>19877350</Id> <Id>19877304</Id> <Id>19877297</Id> > >>> <Id>19877284</Id> <Id>19877271</Id> <Id>19877265</Id> > >>> <Id>19877250</Id> <Id>19877245</Id> <Id>19877226</Id> > >>> <Id>19877210</Id> <Id>19877179</Id> <Id>19877175</Id> > >>> <Id>19877161</Id> <Id>19877159</Id> <Id>19877158</Id> > >>> <Id>19877123</Id> <Id>19877122</Id> <Id>19877120</Id> > >>> <Id>19877119</Id> <Id>19877118</Id> > >>> </IdList><TranslationSet><Translation> <From>cancer</From> > >>> <To>"neoplasms"[MeSH Terms] OR "neoplasms"[All Fields] OR "cancer"[All > >>> Fields]</To> </Translation></TranslationSet><TranslationStack> > >>> <TermSet> <Term>"neoplasms"[MeSH Terms]</Term> <Field>MeSH > >>> Terms</Field> <Count>2082133</Count> <Explode>Y</Explode> > >>> </TermSet> <TermSet> <Term>"neoplasms"[All Fields]</Term> > >>> <Field>All > >>> Fields</Field> <Count>1634731</Count> <Explode>Y</Explode> > >>> </TermSet> <OP>OR</OP> <TermSet> <Term>"cancer"[All Fields]</Term> > >>> <Field>All Fields</Field> <Count>902537</Count> <Explode>Y</Explode> > >>> </TermSet> <OP>OR</OP> <OP>GROUP</OP> <TermSet> > >>> <Term>2009/10/22[EDAT]</Term> <Field>EDAT</Field> <Count>0</Count> > >>> <Explode>Y</Explode> </TermSet> <TermSet> > >>> <Term>2009/11/01[EDAT]</Term> <Field>EDAT</Field> <Count>0</Count> > >>> <Explode>Y</Explode> </TermSet> <OP>RANGE</OP> <OP>AND</OP> > >>> </TranslationStack><QueryTranslation>("neoplasms"[MeSH Terms] OR > >>> "neoplasms"[All Fields] OR "cancer"[All Fields]) AND 2009/10/22[EDAT] : > >>> 2009/11/01[EDAT]</QueryTranslation></eSearchResult> > >>> ^ > >>> > >>> As you can see, the XML output is coming fine but the above error does not > >>> go..The output via this program should be just like hitting manually the > >>> above URL in the browser.. > >>> The browser is Mozilla Firefox. > >>> > >>> Code:- > >>> > >>> <%@ page language = "java" %> > >>> <%@ page import = "java.sql.*" %> > >>> <%@ page import = "java.util.*" %> > >>> <%@ page import = "java.io.*" %> > >>> <%@ page import="java.lang.*" %> > >>> <%@ page import="java.net.*" %> > >>> <%@ page import="java.nio.*" %> > >>> <%@ page contentType="text/xml; charset=utf-8" pageEncoding="UTF-8" %> > >>> > >>> > >>> <% > >>> > >>> try > >>> { > >>> //String str = "<?xml version='1.0' ?>"; > >>> //out.println("<?xml version='1.0' encoding='utf-8' ?>"); > >>> > >>> Properties systemSettings = System.getProperties(); > >>> systemSettings.put("http.proxyHost", "********"); > >>> systemSettings.put("http.proxyPort", "******"); > >>> systemSettings.put("sun.net.client.defaultConnectTimeout", "10000"); > >>> systemSettings.put("sun.net.client.defaultReadTimeout", "10000"); > >>> > >>> //out.println("Properties Set"); > >>> Authenticator.setDefault(new Authenticator() > >>> { > >>> protected PasswordAuthentication getPasswordAuthentication() > >>> { > >>> return new PasswordAuthentication("**", > >>> "******".toCharArray()); // specify ur user name password of iitb login > >>> } > >>> }); > >>> > >>> > >>> System.setProperties(systemSettings); > >>> //out.println("After Authentication & Properties Settings"); > >>> > >>> //create xml file. > >>> //the input to google api > >>> //String textAreaContent = request.getParameter("text"); > >>> String textAreaContent = "This si a tst"; > >>> > >>> String str = "<?xml version='1.0' encoding='utf-8' ?>"; > >>> > >>> //xml file generation ends here.. > >>> //FetchDataFromNCBI_URLString.jsp > >>> String URLString = request.getParameter("txtURLString").trim(); > >>> > >>> //URL url = new URL(" > >>> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=protein&term=BAA20519 > >>> "); > >>> URL url = new URL(URLString); //url string taken from user input. > >>> HttpURLConnection connection = null; > >>> > >>> connection = (HttpURLConnection) url.openConnection(); > >>> System.out.println("After open connection"); > >>> connection.setRequestMethod("POST"); > >>> connection.setDoInput(true); > >>> connection.setDoOutput(true); > >>> > >>> connection.setUseCaches(false); > >>> connection.setAllowUserInteraction(false); > >>> //connection.setFollowRedirects(true); > >>> //connection.setInstanceFollowRedirects(true); > >>> //System.out.println("Before-------------------"); > >>> connection.setRequestProperty ("Content-Type","text/xml; > >>> charset=\"utf-8\""); > >>> //System.out.println("After-------------------"); > >>> > >>> //System.out.println(""+ connection.getOutputStream()); > >>> > >>> //System.out.println("After dataoutputstream..Line No-65"); > >>> > >>> //System.out.println("Response Code="+ connection.getResponseCode); > >>> > >>> OutputStreamWriter dosout = new > >>> OutputStreamWriter(connection.getOutputStream()); > >>> //System.out.println("After dosout object..Line No-63"); > >>> //dosout.write(str); > >>> dosout.close (); > >>> > >>> BufferedReader in = new BufferedReader( new InputStreamReader( > >>> connection.getInputStream())); > >>> > >>> String decodedString; > >>> String tempstr = ""; > >>> > >>> > >>> while ((decodedString = in.readLine()) != null) > >>> { > >>> tempstr = tempstr + decodedString; > >>> //out.println(decodedString); > >>> } > >>> out.println(tempstr); > >>> in.close(); > >>> } > >>> catch(Exception ex) > >>> { > >>> out.println("Exception->"+ex); > >>> PrintWriter pw = response.getWriter(); > >>> ex.printStackTrace(pw); > >>> } > >>> > >>> > >>> %> > >>> > >>> Thanks in advance.. > >>> > >>> Regards, > >>> JItesh Dundas > >>> > >>> _______________________________________________ > >>> Biojava-l mailing list - [email protected] > >>> http://lists.open-bio.org/mailman/listinfo/biojava-l > >> > >> > > <ImportFromPubmed3.jsp>_______________________________________________ > > Biojava-l mailing list - [email protected] > > http://lists.open-bio.org/mailman/listinfo/biojava-l > > -- > Richard Holland, BSc MBCS > Operations and Delivery Director, Eagle Genomics Ltd > T: +44 (0)1223 654481 ext 3 | E: [email protected] > http://www.eaglegenomics.com/ > > -- Richard Holland, BSc MBCS Operations and Delivery Director, Eagle Genomics Ltd T: +44 (0)1223 654481 ext 3 | E: [email protected] http://www.eaglegenomics.com/ _______________________________________________ Biojava-l mailing list - [email protected] http://lists.open-bio.org/mailman/listinfo/biojava-l
