Edwin, How about the following approach:
In class org.apache.crimson.tree.ParentNode, the following method can be found: public void writeChildrenXml (XmlWriteContext context) throws IOException This method writes out child nodes in 'pretty-print' fashion to the stream Writer associated with the context passed-in. I added code to check for TEXT nodes that are all whitespace when not preserving whitespace. If a node meets these conditions, a single space character is substituted for the node's current value. (See the code between the BEGIN ADDITIONS/END ADDITIONS text below.) This prevents the whitespace from growing by an indentation amount when the same XML is repeatedly parsed and then streamed out. It also removes non-indentation related TEXT nodes that are all whitespace undless the xml:space='preserve' attribute is set. public void writeChildrenXml (XmlWriteContext context) throws IOException { if (children == null) return; int oldIndent = 0; boolean preserve = true; boolean pureText = true; if (getNodeType () == ELEMENT_NODE) { preserve = "preserve".equals ( getInheritedAttribute ("xml:space")); oldIndent = context.getIndentLevel (); } try { if (!preserve) context.setIndentLevel (oldIndent + 2); for (int i = 0; i < length; i++) { if (!preserve && children [i].getNodeType () != TEXT_NODE) { context.printIndent (); pureText = false; } // BEGIN ADDITIONS. // If we're not preserving whitespace, and the current node // is a TEXT node containing whitespace and nothing else, skip it. // if ( !preserve && (children [i].getNodeType() == TEXT_NODE) && context.isIndent(children [i].getNodeValue()) ) if ( !preserve && (children [i].getNodeType() == TEXT_NODE) && isWhitespace(children [i].getNodeValue()) ) { // Normalize whitespace to one space character? Writer out = context.getWriter (); out.write( ' ' ); continue; } // END ADDITIONS. children [i].writeXml (context); } } finally { if (!preserve) { context.setIndentLevel (oldIndent); if (!pureText) context.printIndent (); // for ETag } } } The isWhitespace() method just checks the String value to see if it consists entirely of whitespace: // BEGIN ADDITIONS. private boolean isWhitespace ( String value ) { if ( (value == null) || (value.length() == 0) ) return false; int len = value.length( ); for (int i = 0; i < len; i++) { // Character.isSpaceChar() doesn't work here. // Should the following check - taken from method removeWhiteSpaces() - be used instead? // if (c == ' ' || c == '\t' || c == '\n' || c == '\r') if ( !Character.isWhitespace( value.charAt(i) ) ) return false; } return true; } // END ADDITIONS. I tried making the code more discriminating by adding an isIndent() method to the XmlWriteContext class that checks for whitespace in a fashion similar to that used by the printIndent() to generate it, but the algorithm failed due to line separator problems (I'm testing on Win-NT). See the code below. // BEGIN ADDITION public boolean isIndent ( String value ) throws IOException { if ( value == null ) return false; if (!prettyOutput) return false; // NOTE: The following code won't work unless the line separator // is obtained properly!!! (See "line.separator" system property.) // Check that value is long enough to be indentation. if ( value.length() < (XmlDocument.eol.length() + indentLevel) ) return false; // Check that value starts with EOL. if( !value.startsWith( XmlDocument.eol ) ) return false; // Advance past EOL. value = value.substring( XmlDocument.eol.length() ); for ( int i = 0; i < indentLevel; i++ ) { if ( value.charAt(i) != ' ' ) return false; } return true; } // BEGIN ADDITION I tested the code using my unit-tests and it works for my application, but I don't know if it breaks DOM conformance, as I don't have an exhaustive conformance test suite. I've attached the modified source files (ParentNode, XmlWriteContext). Thanks, Bentley -----Original Message----- From: Edwin Goei [mailto:[EMAIL PROTECTED]] Sent: Saturday, December 01, 2001 11:53 AM To: Bentley Drake Cc: '[EMAIL PROTECTED]' Subject: Re: JAXP/Crimson whitespace issue question Bentley Drake wrote: > > Hello, > > I'm noticing that the Crimson library seems to insert an awful lot of > whitespace into XML text streams that it generates from DOM. This > whitespace doesn't appear if a DOCTYPE reference is present, since the code > can use a validating parser and can therefore call the > DocumentBuilderFactory.setIgnoringElementContentWhitespace() method. I've > attached a Java class that parses and regenerates the XML text stream > (string->DOM->string), along with an example XML. Here's an example of the > output when the program is run against an XML document in a cyclic fashion: Not sure I have time to work on this now. Maybe you can provide a patch. It looks like the code is using internal, non-JAXP APIs. If you have Xalan or the latest version of the JAXP RI 1.1.3, then you can use the TrAX API to do what you want. See my unofficial JAXP FAQ at http://xml.apache.org/~edwing/jaxp-faq.html for more info. -Edwin
XmlWriteContext.java
Description: Binary data
ParentNode.java
Description: Binary data
--------------------------------------------------------------------- In case of troubles, e-mail: [EMAIL PROTECTED] To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]