> From: Donald Ball [mailto:[EMAIL PROTECTED]] > > On Sun, 10 Mar 2002, David Crossley wrote: ... > > > the xml returned from the nih server will begin like so: > > > > > > <?xml version="1.0"?> > > > <!DOCTYPE QueryResult PUBLIC "-//NLM//DTD QueryResult, 22 Jan 2002//EN" > > > "/entrez/query/DTD/pmqty_020122.dtd" > > > > <QueryResult> > > > > > > unfortunately, i get an exception when cocoon tries to parse this > > > document. it claims that it cannot access the dtd: > > > > > > java.net.MalformedURLException: no protocol: > > > /entrez/query/DTD/pmqty_020122.dtd ... > but it shouldn't do that. according to the xml spec on system ids: > > http://www.w3.org/TR/REC-xml#dt-sysid > > "Unless otherwise provided by information outside the scope of this > specification (e.g. a special XML element type defined by a particular > DTD, or a processing instruction defined by a particular application > specification), relative URIs are relative to the location of the resource > within which the entity declaration occurs." > > the location of the resource in this case is clearly its url: > > http://www.ncbi.nlm.nih.gov/entrez/utils/pmqty.fcgi?db=PubMed&mode=X ML& > ;dispmax=999&term={1}[au] > > and that's the context in which the system identifier should be resolved, > right? (i could easily be wrong, i'm a little sketchy on the doctype > stuff. the spec seems clear enough on this point to me tho.) > > if so, then while entity catalogs are a nice workaround, they don't work > unless you know in advance the dtd of the remote xml and also know that > it's not going to change. otherwise, your webapp can break without notice. > that's not cool! i'm sorry that i've not been able to come up with a patch > for this, i can't figure out which component is guilty. any clues?
Have you tried to parse this XML with standalone Xerces? Vadim > > - donald > --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, email: [EMAIL PROTECTED]