> But why is a DTD necessary to determine ignoreable white space when > the whitespace I'm talking about is between elements: > > <car> > > <color>red</color><size>large</size> > </car> > > is equivalent to > > <car> > <color>red</color> > <size>large</size> > </car>
If not for a DTD, how do you know that the whitepsace between </color> and <size> is ignorable? If <car> contains mixed content, it wouldn't be ignorable. -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Noah Davis Sent: Thursday, May 18, 2006 4:43 PM To: dom4j-user@lists.sourceforge.net Subject: [dom4j-user] Ignorable white space and confusion Forgive me if this isn't a dom4j specific question. I turned to dom4j in hopes that it would help me solve a problem I was having: comparing two XML documents. I understand that comparing 2 XML documents isn't trivial. Which is why I've turned to third party tools. However, I'm continuously bumping my head up against the wall here in regards to white space and what is considered "ignorable white space". Articles on the subject I've read point out that it's necessary by default to preserve white space when parsing an XML document because the white space could represent actual significant data. Then they show an example like: <signature> --------- Bill Ford 11 West Lane Phili </signature> Or an example with mixed content like: <markup> my <b>body</b> is strong</markup> Ok, I understand that. And maybe a DTD can help determine what's significant here and what's not. But why is a DTD necessary to determine ignoreable white space when the whitespace I'm talking about is between elements: <car> <color>red</color><size>large</size> </car> is equivalent to <car> <color>red</color> <size>large</size> </car> The logic that determines this is ignorable white space doesn't seem that complicated: if the whitespace is the only thing between two Elements, ignore it. This may seem like a special case, but it sure seems like it'd be a pretty prominant special case, especially when XML is being used for data as opposed to documents. Is there any tool in the alphabet soup of XML libraries -- JAXP, DOM, SAX, JDOM -- that recognizes this and allows me to ignore this kind of whitespace without having to specify a DTD? I tried dom4j's NodeComparator and didn't have any luck. If not, can anyone offer a recommendation of a simple way to strip this whitespace so I can compare two obviously equal XML documents? ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ dom4j-user mailing list dom4j-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dom4j-user ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642 _______________________________________________ dom4j-user mailing list dom4j-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dom4j-user