RE: [dom4j-user] Ignorable white space and confusion

Edelson, Justin Thu, 18 May 2006 13:48:11 -0700

> But why is a DTD necessary to determine ignoreable white space when
> the whitespace I'm talking about is between elements:
>
> <car>
>
> <color>red</color><size>large</size>
> </car>
>
> is equivalent to
>
> <car>
> <color>red</color>
> <size>large</size>
> </car>

If not for a DTD, how do you know that the whitepsace between </color>
and <size> is ignorable? If <car> contains mixed content, it wouldn't be
ignorable.

-----Original Message-----
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Noah Davis
Sent: Thursday, May 18, 2006 4:43 PM
To: dom4j-user@lists.sourceforge.net
Subject: [dom4j-user] Ignorable white space and confusion

Forgive me if this isn't a dom4j specific question. I turned to dom4j
in hopes that it would help me solve a problem I was having: comparing
two XML documents.

I understand that comparing 2 XML documents isn't trivial. Which is
why I've turned to third party tools. However, I'm continuously
bumping my head up against the wall here in regards to white space and
what is considered "ignorable white space".

Articles on the subject I've read point out that it's necessary by
default to preserve white space when parsing an XML document because
the white space could represent actual significant data. Then they
show an example like:

<signature>

---------
Bill Ford
11 West Lane
Phili
</signature>

Or an example with mixed content like:

<markup>
my
<b>body</b> is strong</markup>

Ok, I understand that. And maybe a DTD can help determine what's
significant here and what's not.

But why is a DTD necessary to determine ignoreable white space when
the whitespace I'm talking about is between elements:

<car>

<color>red</color><size>large</size>
</car>

is equivalent to

<car>
<color>red</color>
<size>large</size>
</car>

The logic that determines this is ignorable white space doesn't seem
that complicated: if the whitespace is the only thing between two
Elements, ignore it. This may seem like a special case, but it sure
seems like it'd be a pretty prominant special case, especially when
XML is being used for data as opposed to documents.

Is there any tool in the alphabet soup of XML libraries -- JAXP, DOM,
SAX, JDOM -- that recognizes this and allows me to ignore this kind of
whitespace without having to specify a DTD? I tried dom4j's
NodeComparator and didn't have any luck.

If not, can anyone offer a recommendation of a simple way to strip
this whitespace so I can compare two obviously equal XML documents?

-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services,
security?
Get stuff done quickly with pre-integrated technology to make your job
easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
dom4j-user mailing list
dom4j-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dom4j-user

-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642
_______________________________________________
dom4j-user mailing list
dom4j-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dom4j-user

RE: [dom4j-user] Ignorable white space and confusion

Reply via email to