Noah,

I think what you are trying to do is derive a canonical form of each XML
document for comparison.  This turns out to be more involved than simply
addressing ignorable whitespace.  The canonicalization scheme should address
things like handling of empty tags (e.g. <MyTag></MyTag> versus <MyTag/>),
attribute ordering, character encoding, comment preservation, XML
Declaration, etc.  As you mentioned this is a very useful concept especially
when dealing with XML digital signatures.  

W3C has developed a Recommendation called Canonical XML for this purpose
(see http://www.w3.org/TR/xml-c14n).  

The XML Security project (http://xml.apache.org/security/) has an XML
Canonicalizer class that implements this W3C Recommendation.

-Mark

-----Original Message-----
From: Noah Davis [mailto:[EMAIL PROTECTED] 
Sent: Thursday, May 18, 2006 6:50 PM
To: Edelson, Justin
Cc: dom4j-user@lists.sourceforge.net
Subject: Re: [dom4j-user] Ignorable white space and confusion


So I've ended up writing a little piece of code to remove whitespace text
nodes:

        public static void removeWhitespaceNodes(Branch a_branch)
        {
                for (int i = 0; i < a_branch.nodeCount(); i++)
                {
                        Node checkNode = a_branch.node(i);
                        if (checkNode.getNodeType() == Node.TEXT_NODE)
                        {
                                if (checkNode.getText().trim().equals(""))
                                {
                                        checkNode.detach();
                                }                               
                        } else if (checkNode.getNodeType() ==
Node.ELEMENT_NODE)
                        {
                                removeWhitespaceNodes((Element)checkNode);
                        }
                }
        }


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job
easier Download IBM WebSphere Application Server v.1.0.1 based on Apache
Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
dom4j-user mailing list
dom4j-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dom4j-user


-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
dom4j-user mailing list
dom4j-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to