Hello,

I'm noticing that the Crimson library seems to insert an awful lot of
whitespace into XML text streams that it generates from DOM.  This
whitespace doesn't appear if a DOCTYPE reference is present, since the code
can use a validating parser and can therefore call the
DocumentBuilderFactory.setIgnoringElementContentWhitespace() method.  I've
attached a Java class that parses and regenerates the XML text stream
(string->DOM->string), along with an example XML.  Here's an example of the
output when the program is run against an XML document in a cyclic fashion:

First program run output:

Input:
<?xml version="1.0"?>
<Test>
  <foo value="bar" />
  <shaz value="bat" />
</Test>

Parsing String to DOM ...
Generating String from DOM ...
Output:
<?xml version="1.0"?>

<Test>

  <foo value="bar" />

  <shaz value="bat" />

</Test>


Second program run output (on first run's output):

Input:
<?xml version="1.0"?>

<Test>

  <foo value="bar" />

  <shaz value="bat" />

</Test>


Parsing String to DOM ...
Generating String from DOM ...
Output:
<?xml version="1.0"?>

<Test>


  <foo value="bar" />


  <shaz value="bat" />


</Test>

Notice that each successive run adds additional lines of whitespace.  This
causes problems in applications that repeatedly parse XML and then
re-generate it, since the XML grows without bound - even if no modifications
are made.  Is there some set of JAXP/DOM calls that can be made (or removed
from the example code) to prevent this?  Walking the DOM tree and removing
the nodes associated with the whitespace won't be acceptable in my case, due
to performance overhead.


Thanks,

Bentley


 <<TestCrimson.java>>  <<test.xml>> 

Attachment: TestCrimson.java
Description: Binary data

Attachment: test.xml
Description: Binary data

---------------------------------------------------------------------
In case of troubles, e-mail:     [EMAIL PROTECTED]
To unsubscribe, e-mail:          [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to