Hi Manfred

I hope you don't mind, I CC'd the dom4j-dev list so this response gets
archived...

----- Original Message -----
From: "Manfred Lotz" <[EMAIL PROTECTED]>
> Hi James,
>
> After further testing I found that we now have a new bug in the
> XMLWriter class.
>
> Using the modified XMLWriter class something like this happened:
>
> <meaning>to live</meaning>
>
> gets changed into
>
> <meaning>to li ve</meaning>
>
> It doesn't happen for all <meaning> tags in the document but for some???!!

I guess thats the SAX parser splitting up text based on buffer boundaries.
e.g. if it reads in chunks of 1K and a text block spans across the block
boundary, then 2 SAX characters() methods will be fired which will result in
2 Text nodes. Incidentally this behaviour can be disabled by calling
SAXReader.setMergeAdjacentTextNodes(true).

So there is a bug in the current whitespace handling when adjacent text
nodes are used - its always printing a space character which breaks your
document.

> I had a look at the source. It seems to be that the portions of text
> which go into writeNode(Node node) in the case of  nodeType ==
> Node.TEXT_NODE are unpredictable. In the example above it gets fed by
> "to li" the 1st time and by "ve" the second time which makes it
> impossible to fix it easily in writeString(String text).

Yes. This is the SAX parser splitting up a piece of text into seperate
characters() callbacks which result in seperate Text nodes unless the
"mergeAdjacentTextNodes" property is enabled.

I have added a test case, testWhitespaceBug2(), to the JUnit test suite
org.dom4j.TestXMLWriter which tests your use case (and indeed was broken).

I've now fixed it. Essentially the trick is that when whitespace trimming is
enabled, all consecutive Text nodes are concatenated before the text gets
StringTokenized. The code is a little longer than I would like but its all
in the writeElementContent() method in XMLWriter if you're interested in
taking a peek. Its all in CVS right now - hopefully we can get the daily
build working again soon.

Thanks for spotting this issue Manfred! Hopefully its now fixed.

James


So I've modified the code


_________________________________________________________
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com


_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to