Simon, thanks for the information. I am using Digester 1.6 and reviewing bug 28739 it appears to be an exact match for the problem I asked about and that apparently you fixed about a year ago! Only difference is that the reported bug did not make any mention speicifically an UnknownHostException which is the result of the incorrectly formatted url - it calls it a "very uninformative exception".
Thanks again for the help! -----Original Message----- From: Simon Kitching [mailto:[EMAIL PROTECTED] Sent: Thursday, July 21, 2005 7:13 PM To: Jakarta Commons Users List Subject: Re: [Digester] UnknownHostException when file contains SYSTEM dtd On Thu, 2005-07-21 at 14:51 -0500, Mike Miller wrote: > After struggling with a digester problem all day, I finally found a posting > that was about 1 ½ years old that help solve my problem - but I would like to > see if someone can provide an explanation so that I can learn and understand > what was happening. > > > > The problem: I have several files that I am processing with the digester. > The xml files and the dtd reside in the same directory within my web > application. The first couple of lines of one of the files is shown below - > using only a SYSTEM id. > > > > <?xml version="1.0" encoding="UTF-8"?> > > <!DOCTYPE root SYSTEM "ReportType.dtd"> > > > > <root> > > ... > > > > When calling digester.parse() using a File object, the call results in a > "UnknownHostException c" where c is the windows drive where my files are > located - apparently the systemId was generated as > file://c/mydir/conf/creport/reporttypes/ReportType.dtd > <file:///\\c\mydir\conf\creport\reporttypes\ReportType.dtd> and the c is > interpreted as a machine/host name. > > > > Changing the code to call digester.parse() with the String parameter > providing the full path of the file works. > > > > Looking at the Digester code, I guess this may be more of a SAX question > because I can see where the parse() methods convert their input into an > InputSource, but why does the parse() version with a File call the > setSystemId()? > The reason that setSystemId is called is so that resources referenced from the xml document (esp. the DTD file) are looked up relative to the original file parsed. You say in your example above that the xml file and the dtd are in "the same directory" but if we never tell the xml parser where the xml was read from, how's it going to find the dtd? By default, if you pass an InputSource (which just wraps a stream) to the parser without specifying the systemid, then any relative references to DTDs etc. are just looked up relative to the current working directory of the application - the parser can't possibly deduce the real original source of a stream. Error messages generated by the parser also include the systemId of the document: if this isn't set then the error messages can be less than helpful. Note that InputSource.setSystemId is nothing to do with the SYSTEM value in the xml document, other than it sets a base path that is used for lookups if the SYSTEM value is a relative path. The UnknownHostException "c" stuff isn't something that has been reported before as far as I know. I am somewhat surprised you are seeing this, as I would have thought what you are doing would be common and therefore other people would have encountered this. I don't use MS-Windows so can't help you with debugging but if you can provide a patch I'll check and commit it. BTW, which version of Digester are you using? Bugzilla#28739 has been fixed in 1.7 which might be related to your issue. Regards, Simon --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
