[dom4j-dev] new options to merge text nodes and strip whitespace nodes

James Strachan Thu, 15 Nov 2001 16:28:22 -0800

I've added 2 new options to SAXReader to allow more optimal parsing of
data-centric XML documents. (i.e. documents where the performance of parsing
is more important than the perfect preservation of whitespace).


In particular the common kind of XML as follows

<foo>
    <bar>
        <x>1234</x>
    </bar>
</foo>

would create 5 text nodes by default, only 1 of them contain "1234" the
others all containing whitespace. The new 'stripWhitespaceText' will remove
all completely whitespace text between sequential pairs of start/end tags.
So the above would consist of 3 elements and 1 test node, as most people new
to XML would expect. In addition the new 'mergeAdjacentText' will ensure
that adjacent text nodes are concatenated into the same Text node

Its documented more fully on the status page on the website.

Both these 2 new options when applied to reasonably complex documents can
lead to a 10% or so performance improvement, give or take 5% or so,
depending on the nesting and structure of the document.

James


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

[dom4j-dev] new options to merge text nodes and strip whitespace nodes

Reply via email to