Re: [dom4j-dev] fix for writing numeric entities in XMLWriter

Amelia A Lewis Fri, 21 Jun 2002 05:11:57 -0700

*sigh*

On Thu, 2002-06-20 at 21:59, Terry Steichen wrote:
> I've had this nagging problem in that I read in an HTML file with, for
> example, &#150;&#150; in it.  I then 'Tidy' it up and use dom4j to extract
> the details and construct a Document and write it out using XMLWriter.  What
> I end up seeing is ?? where the &#150;&#150; was.  It should (sort of) look
> like '--'.  Would your fix handle this, or will it only handle characters
> that are actually embedded as raw numbers?


These are not legal characters in XML.  Numeric entities may not
represent control characters in the range 0-32 (with exceptions for
whitespace and line feeds) or 128-160.  These are Unicode "control
characters", blocks C0 and C1.  In Unicode, the dash character you are
trying to encode doesn't live in the control character range (it doesn't
live there in ISO8859, either, which respects the high control block, so
I can confidently predict that you're using Windows, and code page
1252).

To get the character properly, you should determine what its code is in
its proper location (sorry, too early in the morning for me to look
through my Unicode book), and use that (and hope that Windows translates
it back for display).

Amy!
> 
> Regards,
> 
> Terry
> 
> ----- Original Message -----
> From: "Dan Jacobs" <[EMAIL PROTECTED]>
> To: <[EMAIL PROTECTED]>
> Sent: Wednesday, June 19, 2002 11:10 PM
> Subject: [dom4j-dev] fix for writing numeric entities in XMLWriter
> 
> 
> > I found and fixed a simple bug in XMLWriter.java.  A fixed version is
> > attached, but not checked into the source repository.  I'll leave that
> > for the folks who maintain the sources.
> >
> > If you parse and print out (using asXML) an XML document containing a
> > numeric entity reference such as &#160; (the code for &nbsp;) the code
> > was just writing out a byte with a value of 160 (decimal).  The fix
> > (abstracted below) is to check for characters with integer codes below
> > 32 and above 126 (excluding standard whitespace characters) and encode
> > them as numeric entities.  The same fix is made in two places in
> > XMLWriter.java.
> >
> > Thanks for reading.
> > -- Dan Jacobs
> >
> >             char c;     // declaration and assignment added by Dan Jacobs
> >             switch( c = text.charAt(i) ) {
> >                 case '<' :
> >                     entity = "&lt;";
> >                     break;
> >                 case '>' :
> >                     entity = "&gt;";
> >                     break;
> >                 case '&' :
> >                     entity = "&amp;";
> >                     break;
> >
> >                 file://!!! Begin code added by Dan Jacobs !!!//
> >                 case '\t': case '\n': case '\r':
> >                     // don't encode standard whitespace characters
> >                     break;
> >                 default:
> >                     // encode low and high characters as entities
> >                     if ((c < 32) || (c >= 127))
> >                         entity = "&#" + (int)c + ";";
> >                     break;
> >                 file://!!! End code added by Dan Jacobs !!!//
> >             }
> >
> > --
> > Daniel S. Jacobs
> > President, Model Objects Group
> > Object-Orient Software Engineering
> > Java & Web Application Development
> >
> >
> 
> 
> ----------------------------------------------------------------------------
> ----
> 
> 
> > /*
> >  * Copyright 2001 (C) MetaStuff, Ltd. All Rights Reserved.
> >  *
> >  * This software is open source.
> >  * See the bottom of this file for the licence.
> >  *
> >  * $Id: XMLWriter.java,v 1.46 2002/02/14 11:55:46 jstrachan Exp $
> >  */
> >
> > package org.dom4j.io;
> >
> > import java.io.BufferedOutputStream;
> > import java.io.BufferedWriter;
> > import java.io.ByteArrayOutputStream;
> > import java.io.IOException;
> > import java.io.OutputStream;
> > import java.io.OutputStreamWriter;
> > import java.io.StringWriter;
> > import java.io.UnsupportedEncodingException;
> > import java.io.Writer;
> > import java.util.HashMap;
> > import java.util.HashSet;
> > import java.util.Iterator;
> > import java.util.LinkedList;
> > import java.util.List;
> > import java.util.Map;
> > import java.util.Set;
> > import java.util.StringTokenizer;
> >
> > import org.dom4j.Attribute;
> > import org.dom4j.CDATA;
> > import org.dom4j.CharacterData;
> > import org.dom4j.Comment;
> > import org.dom4j.DocumentType;
> > import org.dom4j.Document;
> > import org.dom4j.Element;
> > import org.dom4j.Entity;
> > import org.dom4j.Namespace;
> > import org.dom4j.Node;
> > import org.dom4j.ProcessingInstruction;
> > import org.dom4j.Text;
> >
> > import org.dom4j.tree.NamespaceStack;
> >
> > import org.xml.sax.Attributes;
> > import org.xml.sax.ContentHandler;
> > import org.xml.sax.DTDHandler;
> > import org.xml.sax.InputSource;
> > import org.xml.sax.Locator;
> > import org.xml.sax.SAXException;
> > import org.xml.sax.SAXNotRecognizedException;
> > import org.xml.sax.SAXNotSupportedException;
> > import org.xml.sax.XMLReader;
> > import org.xml.sax.ext.LexicalHandler;
> > import org.xml.sax.helpers.XMLFilterImpl;
> >
> > /**<p><code>XMLWriter</code> takes a DOM4J tree and formats it to a
> >   * stream as XML.
> >   * It can also take SAX events too so can be used by SAX clients as this
> object
> >   * implements the {@link ContentHandler} and {@link LexicalHandler}
> interfaces.
> >   * as well. This formatter performs typical document
> >   * formatting.  The XML declaration and processing instructions are
> >   * always on their own lines. An {@link OutputFormat} object can be
> >   * used to define how whitespace is handled when printing and allows
> various
> >   * configuration options, such as to allow suppression of the XML
> declaration,
> >   * the encoding declaration or whether empty documents are collapsed.</p>
> >   *
> >   * <p> There are <code>write(...)</code> methods to print any of the
> >   * standard DOM4J classes, including <code>Document</code> and
> >   * <code>Element</code>, to either a <code>Writer</code> or an
> >   * <code>OutputStream</code>.  Warning: using your own
> >   * <code>Writer</code> may cause the writer's preferred character
> >   * encoding to be ignored.  If you use encodings other than UTF8, we
> >   * recommend using the method that takes an OutputStream instead.
> >   * </p>
> >   *
> >   * @author <a href="mailto:[EMAIL PROTECTED]";>James Strachan</a>
> >   * @author Joseph Bowbeer
> >   * @version $Revision: 1.46 $
> >   */
> > public class XMLWriter extends XMLFilterImpl implements LexicalHandler {
> >
> >     protected static final String[] LEXICAL_HANDLER_NAMES = {
> >         "http://xml.org/sax/properties/lexical-handler";,
> >         "http://xml.org/sax/handlers/LexicalHandler";
> >     };
> >
> >     private static final boolean ESCAPE_TEXT = true;
> >     private static final boolean SUPPORT_PAD_TEXT = false;
> >
> >     protected static final OutputFormat DEFAULT_FORMAT = new
> OutputFormat();
> >
> >     /** Stores the last type of node written so algorithms can refer to
> the
> >       * previous node type */
> >     protected int lastOutputNodeType;
> >
> >     /** The Writer used to output to */
> >     protected Writer writer;
> >
> >     /** The Stack of namespaceStack written so far */
> >     private NamespaceStack namespaceStack = new NamespaceStack();
> >
> >     /** The format used by this writer */
> >     private OutputFormat format;
> >     /** The initial number of indentations (so you can print a whole
> >         document indented, if you like) **/
> >     private int indentLevel = 0;
> >
> >     /** buffer used when escaping strings */
> >     private StringBuffer buffer = new StringBuffer();
> >
> >     /** Whether a flush should occur after writing a document */
> >     private boolean autoFlush;
> >
> >     /** Lexical handler we should delegate to */
> >     private LexicalHandler lexicalHandler;
> >
> >     /** Whether comments should appear inside DTD declarations - defaults
> to false */
> >     private boolean showCommentsInDTDs;
> >
> >     /** Is the writer curerntly inside a DTD definition? */
> >     private boolean inDTD;
> >
> >
> >     public XMLWriter(Writer writer) {
> >         this( writer, DEFAULT_FORMAT );
> >     }
> >
> >     public XMLWriter(Writer writer, OutputFormat format) {
> >         this.writer = writer;
> >         this.format = format;
> >     }
> >
> >     public XMLWriter() {
> >         this.format = DEFAULT_FORMAT;
> >         this.writer = new BufferedWriter( new
> utputStreamWriter( System.out ) );
> >         this.autoFlush = true;
> >     }
> >
> >     public XMLWriter(OutputStream out) throws UnsupportedEncodingException
> {
> >         this.format = DEFAULT_FORMAT;
> >         this.writer = createWriter(out, format.getEncoding());
> >         this.autoFlush = true;
> >     }
> >
> >     public XMLWriter(OutputStream out, OutputFormat format) throws
> UnsupportedEncodingException {
> >         this.format = format;
> >         this.writer = createWriter(out, format.getEncoding());
> >         this.autoFlush = true;
> >     }
> >
> >     public XMLWriter(OutputFormat format) throws
> UnsupportedEncodingException {
> >         this.format = format;
> >         this.writer = createWriter( System.out, format.getEncoding() );
> >         this.autoFlush = true;
> >     }
> >
> >
> >     public void setWriter(Writer writer) {
> >         this.writer = writer;
> >         this.autoFlush = false;
> >     }
> >
> >     public void setOutputStream(OutputStream out) throws
> UnsupportedEncodingException {
> >         this.writer = createWriter(out, format.getEncoding());
> >         this.autoFlush = true;
> >     }
> >
> >
> >     /** Set the initial indentation level.  This can be used to output
> >       * a document (or, more likely, an element) starting at a given
> >       * indent level, so it's not always flush against the left margin.
> >       * Default: 0
> >       *
> >       * @param indentLevel the number of indents to start with
> >       */
> >     public void setIndentLevel(int indentLevel) {
> >         this.indentLevel = indentLevel;
> >     }
> >
> >     /** Flushes the underlying Writer */
> >     public void flush() throws IOException {
> >         writer.flush();
> >     }
> >
> >     /** Closes the underlying Writer */
> >     public void close() throws IOException {
> >         writer.close();
> >     }
> >
> >     /** Writes the new line text to the underlying Writer */
> >     public void println() throws IOException {
> >         writer.write( format.getLineSeparator() );
> >     }
> >
> >     /** Writes the given {@link Attribute}.
> >       *
> >       * @param attribute <code>Attribute</code> to output.
> >       */
> >     public void write(Attribute attribute) throws IOException
> 
> >         writeAttribute(attribute);
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >
> >     /** <p>This will print the <code>Document</code> to the current
> Writer.</p>
> >      *
> >      * <p> Warning: using your own Writer may cause the writer's
> >      * preferred character encoding to be ignored.  If you use
> >      * encodings other than UTF8, we recommend using the method that
> >      * takes an OutputStream instead.  </p>
> >      *
> >      * <p>Note: as with all Writers, you may need to flush() yours
> >      * after this method returns.</p>
> >      *
> >      * @param doc <code>Document</code> to format.
> >      * @throws <code>IOException</code> - if there's any problem writing.
> >      **/
> >     public void write(Document doc) throws IOException {
> >         writeDeclaration();
> >
> >         if (doc.getDocType() != null) {
> >             indent();
> >             writeDocType(doc.getDocType());
> >         }
> >
> >         for ( int i = 0, size = doc.nodeCount(); i < size; i++ ) {
> >             Node node = doc.node(i);
> >             writeNode( node );
> >         }
> >         writePrintln();
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** <p>Writes the <code>{@link Element}</code>, including
> >       * its <code>{@link Attribute}</code>s, and its value, and all
> >       * its content (child nodes) to the current Writer.</p>
> >       *
> >       * @param element <code>Element</code> to output.
> >       */
> >     public void write(Element element) throws IOException {
> >         writeElement(element);
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >
> >     /** Writes the given {@link CDATA}.
> >       *
> >       * @param cdata <code>CDATA</code> to output.
> >       */
> >     public void write(CDATA cdata) throws IOException {
> >         writeCDATA( cdata.getText() );
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** Writes the given {@link Comment}.
> >       *
> >       * @param comment <code>Comment</code> to output.
> >       */
> >     public void write(Comment comment) throws IOException
> 
> >         writeComment( comment.getText() );
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** Writes the given {@link DocumentType}.
> >       *
> >       * @param docType <code>DocumentType</code> to output.
> >       */
> >     public void write(DocumentType docType) throws IOException {
> >         writeDocType(docType);
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >
> >     /** Writes the given {@link Entity}.
> >       *
> >       * @param entity <code>Entity</code> to output.
> >       */
> >     public void write(Entity entity) throws IOException {
> >         writeEntity( entity );
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >
> >     /** Writes the given {@link Namespace}.
> >       *
> >       * @param namespace <code>Namespace</code> to output.
> >       */
> >     public void write(Namespace namespace) throws IOException {
> >         writeNamespace(namespace);
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** Writes the given {@link ProcessingInstruction}.
> >       *
> >       * @param processingInstruction <code>ProcessingInstruction</code> to
> output.
> >       */
> >     public void write(ProcessingInstruction processingInstruction) throws
> IOException {
> >         writeProcessingInstruction(processingInstruction);
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** <p>Print out a {@link String}, Perfoms
> >       * the necessary entity escaping and whitespace stripping.</p>
> >       *
> >       * @param text is the text to output
> >       */
> >     public void write(String text) throws IOException {
> >         writeString(text);
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** Writes the given {@link Text}.
> >       *
> >       * @param text <code>Text</code> to output.
> >       */
> >     public void write(Text text) throws IOException {
> >         writeString(text.getText());
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** Writes the given {@link Node}.
> >       *
> >       * @param node <code>Node</code> to output.
> >       */
> >     public void write(Node node) throws IOException {
> >         writeNode(node);
> >
> >         if ( autoFlush ) {
> >             flush();
> >         }
> >     }
> >
> >     /** Writes the given object which should be a String, a Node or a List
> >       * of Nodes.
> >       *
> >       * @param object is the object to output.
> >       */
> >     public void write(Object object) throws IOException {
> >         if (object instanceof Node) {
> >             write((Node) object);
> >         }
> >         else if (object instanceof String) {
> >             write((String) object);
> >         }
> >         else if (object instanceof List) {
> >             List list = (List) object;
> >             for ( int i = 0, size = list.size(); i < size; i++ ) {
> >                 write( list.get(i) );
> >             }
> >         }
> >         else if (object != null) {
> >             throw new IOException( "Invalid object: " + object );
> >         }
> >     }
> >
> >
> >     /** <p>Writes the opening tag of an {@link Element},
> >       * including its {@link Attribute}s
> >       * but without its content.</p>
> >       *
> >       * @param element <code>Element</code> to output.
> >       */
> >     public void writeOpen(Element element) throws IOException {
> >         writer.write("<");
> >         writer.write( element.getQualifiedName() );
> >         writeAttributes(element);
> >         writer.write(">");
> >     }
> >
> >     /** <p>Writes the closing tag of an {@link Element}</p>
> >       *
> >       * @param element <code>Element</code> to output.
> >       */
> >     public void writeClose(Element element) throws IOException {
> >         writeClose( element.getQualifiedName() );
> >     }
> >
> >
> >     // XMLFilterImpl methods
> >
> file://---------------------------------------------------------------------
> ----
> >     public void parse(InputSource source) throws IOException, SAXException
> {
> >         installLexicalHandler();
> >         super.parse(source);
> >     }
> >
> >
> >     public void setProperty(String name, Object value) throws
> SAXNotRecognizedException, SAXNotSupportedException {
> >         for (int i = 0; i < LEXICAL_HANDLER_NAMES.length; i++) {
> >             if (LEXICAL_HANDLER_NAMES[i].equals(name)) {
> >                 setLexicalHandler((LexicalHandler) value);
> >                 return;
> >             }
> >         }
> >         super.setProperty(name, value);
> >     }
> >
> >     public Object getProperty(String name) throws
> SAXNotRecognizedException, SAXNotSupportedException {
> >         for (int i = 0; i < LEXICAL_HANDLER_NAMES.length; i++) {
> >             if (LEXICAL_HANDLER_NAMES[i].equals(name)) {
> >                 return getLexicalHandler();
> >             }
> >         }
> >         return super.getProperty(name);
> >     }
> >
> >     public void setLexicalHandler (LexicalHandler handler) {
> >         if (handler == null) {
> >             throw new NullPointerException("Null lexical handler");
> >         }
> >         else {
> >             this.lexicalHandler = handler;
> >         }
> >     }
> >
> >     public LexicalHandler getLexicalHandler(){
> >         return lexicalHandler;
> >     }
> >
> >
> >     // ContentHandler interface
> >
> file://---------------------------------------------------------------------
> ----
> >     public void setDocumentLocator(Locator locator) {
> >         super.setDocumentLocator(locator);
> >     }
> >
> >     public void startDocument() throws SAXException {
> >         try {
> >             writeDeclaration();
> >             super.startDocument();
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >     }
> >
> >     public void endDocument() throws SAXException {
> >         super.endDocument();
> >     }
> >
> >     public void startPrefixMapping(String prefix, String uri) throws
> SAXException {
> >         super.startPrefixMapping(prefix, uri);
> >     }
> >
> >     public void endPrefixMapping(String prefix) throws SAXException {
> >         super.endPrefixMapping(prefix);
> >     }
> >
> >
> >     public void startElement(String namespaceURI, String localName, String
> qName, Attributes attributes) throws SAXException {
> >         try {
> >             writePrintln();
> >             indent();
> >             writer.write("<");
> >             writer.write(qName);
> >             writeAttributes( attributes );
> >             writer.write(">");
> >             ++indentLevel;
> >             lastOutputNodeType = Node.ELEMENT_NODE;
> >
> >             super.startElement( namespaceURI, localName, qName,
> attributes );
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >     }
> >
> >     public void endElement(String namespaceURI, String localName, String
> qName) throws SAXException {
> >         try {
> >             --indentLevel;
> >             if ( lastOutputNodeType == Node.ELEMENT_NODE ) {
> >                 writePrintln();
> >                 indent();
> >             }
> >
> >             // XXXX: need to determine this using a stack and checking for
> >             // content / children
> >             boolean hadContent = true;
> >             if ( hadContent ) {
> >                 writeClose(qName);
> >             }
> >             else {
> >                 writeEmptyElementClose(qName);
> >             }
> >             lastOutputNodeType = Node.ELEMENT_NODE;
> >
> >             super.endElement( namespaceURI, localName, qName );
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >     }
> >
> >     public void characters(char[] ch, int start, int length) throws
> SAXException {
> >         try {
> >             write( new String( ch, start, length ) );
> >
> >             super.characters(ch, start, length);
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >     }
> >
> >     public void ignorableWhitespace(char[] ch, int start, int length)
> throws SAXException {
> >         super.ignorableWhitespace(ch, start, length);
> >     }
> >
> >     public void processingInstruction(String target, String data) throws
> SAXException {
> >         try {
> >             indent();
> >             writer.write("<?");
> >             writer.write(target);
> >             writer.write(" ");
> >             writer.write(data);
> >             writer.write("?>");
> >             writePrintln();
> >             lastOutputNodeType = Node.PROCESSING_INSTRUCTION_NODE;
> >
> >             super.processingInstruction(target, data);
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >     }
> >
> >
> >
> >     // DTDHandler interface
> >
> file://---------------------------------------------------------------------
> ----
> >     public void notationDecl(String name, String publicID, String
> systemID) throws SAXException {
> >         super.notationDecl(name, publicID, systemID);
> >     }
> >
> >     public void unparsedEntityDecl(String name, String publicID, String
> systemID, String notationName) throws SAXException
> 
> >         super.unparsedEntityDecl(name, publicID, systemID, notationName);
> >     }
> >
> >
> >     // LexicalHandler interface
> >
> file://---------------------------------------------------------------------
> ----
> >     public void startDTD(String name, String publicID, String systemID)
> throws SAXException {
> >         inDTD = true;
> >         try {
> >             writeDocType(name, publicID, systemID);
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >
> >         if (lexicalHandler != null) {
> >             lexicalHandler.startDTD(name, publicID, systemID);
> >         }
> >     }
> >
> >     public void endDTD() throws SAXException
> 
> >         inDTD = false;
> >         if (lexicalHandler != null) {
> >             lexicalHandler.endDTD();
> >         }
> >     }
> >
> >     public void startCDATA() throws SAXException {
> >         try {
> >             writer.write( "<![CDATA[" );
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >
> >         if (lexicalHandler != null) {
> >             lexicalHandler.startCDATA();
> >         }
> >     }
> >
> >     public void endCDATA() throws SAXException {
> >         try {
> >             writer.write( "]]>" );
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >
> >         if (lexicalHandler != null) {
> >             lexicalHandler.endCDATA();
> >         }
> >     }
> >
> >     public void startEntity(String name) throws SAXException {
> >         try {
> >             writeEntityRef(name);
> >         }
> >         catch (IOException e) {
> >             handleException(e);
> >         }
> >
> >         if (lexicalHandler != null) {
> >             lexicalHandler.startEntity(name);
> >         }
> >     }
> >
> >     public void endEntity(String name) throws SAXException
> 
> >         if (lexicalHandler != null) {
> >             lexicalHandler.endEntity(name);
> >         }
> >     }
> >
> >     public void comment(char[] ch, int start, int length) throws
> SAXException {
> >         if ( showCommentsInDTDs || ! inDTD ) {
> >             try {
> >                 writeComment( new String(ch, start, length) );
> >             }
> >             catch (IOException e) {
> >                 handleException(e);
> >             }
> >         }
> >
> >         if (lexicalHandler != null) {
> >             lexicalHandler.comment(ch, start, length);
> >         }
> >     }
> >
> >
> >
> >     // Implementation methods
> >
> file://---------------------------------------------------------------------
> ----
> >     protected void writeElement(Element element) throws IOException {
> >         int size = element.nodeCount();
> >         String qualifiedName = element.getQualifiedName();
> >
> >         writePrintln();
> >         indent();
> >
> >         writer.write("<");
> >         writer.write(qualifiedName);
> >
> >         int previouslyDeclaredNamespaces = namespaceStack.size();
> >         Namespace ns = element.getNamespace();
> >         if (isNamespaceDeclaration( ns ) ) {
> >             namespaceStack.push(ns);
> >             writeNamespace(ns);
> >         }
> >
> >         // Print out additional namespace declarations
> >         boolean textOnly = true;
> >         for ( int i = 0; i < size; i++ ) {
> >             Node node = element.node(i);
> >             if ( node instanceof Namespace ) {
> >                 Namespace additional = (Namespace) node;
> >                 if (isNamespaceDeclaration( additional ) ) {
> >                     namespaceStack.push(additional);
> >                     writeNamespace(additional);
> >                 }
> >             }
> >             else if ( node instanceof Element) {
> >                 textOnly = false;
> >             }
> >         }
> >
> >         writeAttributes(element);
> >
> >         lastOutputNodeType = Node.ELEMENT_NODE;
> >
> >         if ( size <= 0 ) {
> >             writeEmptyElementClose(qualifiedName);
> >         }
> >         else {
> >             writer.write(">");
> >             if ( textOnly ) {
> >                 // we have at least one text node so lets assume
> >                 // that its non-empty
> >                 writeElementContent(element);
> >             }
> >             else {
> >                 // we know it's not null or empty from above
> >                 ++indentLevel;
> >
> >                 writeElementContent(element);
> >
> >                 --indentLevel;
> >
> >                 writePrintln();
> >                 indent();
> >             }
> >             writer.write("</");
> >             writer.write(qualifiedName);
> >             writer.write(">");
> >         }
> >
> >         // remove declared namespaceStack from stack
> >         while (namespaceStack.size() > previouslyDeclaredNamespaces) {
> >             namespaceStack.pop();
> >         }
> >
> >         lastOutputNodeType = Node.ELEMENT_NODE;
> >     }
> >
> >     /** Outputs the content of the given element. If whitespace trimming
> is
> >      * enabled then all adjacent text nodes are appended together before
> >      * the whitespace trimming occurs to avoid problems with multiple
> >      * text nodes being created due to text content that spans parser
> buffers
> >      * in a SAX parser.
> >      */
> >     protected void writeElementContent(Element element) throws IOException
> {
> >         if (format.isTrimText()) {
> >             // concatenate adjacent text nodes together
> >             // so that whitespace trimming works properly
> >             Text lastTextNode = null;
> >             StringBuffer buffer = null;
> >             for ( int i = 0, size = element.nodeCount(); i < size; i++ ) {
> >                 Node node = element.node(i);
> >                 if ( node instanceof Text ) {
> >                     if ( lastTextNode == null ) {
> >                         lastTextNode = (Text) node;
> >                     }
> >                     else {
> >                         buffer = new
> StringBuffer( lastTextNode.getText() );
> >                         buffer.append( ((Text) node).getText() );
> >                     }
> >                 }
> >                 else {
> >                     if ( lastTextNode != null )
> 
> >                         if ( buffer != null ) {
> >                             writeString( buffer.toString() );
> >                             buffer = null;
> >                         }
> >                         else {
> >                             writeString( lastTextNode.getText() );
> >                         }
> >                         lastTextNode = null;
> >                     }
> >                     writeNode(node);
> >                 }
> >             }
> >             if ( lastTextNode != null )
> 
> >                 if ( buffer != null ) {
> >                     writeString( buffer.toString() );
> >                     buffer = null;
> >                 }
> >                 else {
> >                     writeString( lastTextNode.getText() );
> >                 }
> >                 lastTextNode = null;
> >             }
> >         }
> >         else {
> >             for ( int i = 0, size = element.nodeCount(); i < size; i++ ) {
> >                 Node node = element.node(i);
> >                 writeNode(node);
> >             }
> >         }
> >     }
> >     protected void writeCDATA(String text) throws IOException {
> >         writer.write( "<![CDATA[" );
> >         writer.write( text );
> >         writer.write( "]]>" );
> >
> >         lastOutputNodeType = Node.CDATA_SECTION_NODE;
> >     }
> >
> >     protected void writeDocType(DocumentType docType) throws IOException {
> >         if (docType != null) {
> >             docType.write( writer );
> >             file://writeDocType( docType.getElementName(),
> docType.getPublicID(), docType.getSystemID() );
> >             writePrintln();
> >         }
> >     }
> >
> >
> >     protected void writeNamespace(Namespace namespace) throws IOException
> {
> >         if ( namespace != null ) {
> >             String prefix = namespace.getPrefix();
> >             writer.write(" xmlns");
> >             if (prefix != null && prefix.length() > 0) {
> >                 writer.write(":");
> >                 writer.write(prefix);
> >             }
> >             writer.write("=\"");
> >             writer.write(namespace.getURI());
> >             writer.write("\"");
> >         }
> >     }
> >
> >     protected void writeProcessingInstruction(ProcessingInstruction
> processingInstruction) throws IOException {
> >         file://indent();
> >         writer.write( "<?" );
> >         writer.write( processingInstruction.getName() );
> >         writer.write( " " );
> >         writer.write( processingInstruction.getText() );
> >         writer.write( "?>" );
> >         writePrintln();
> >
> >         lastOutputNodeType = Node.PROCESSING_INSTRUCTION_NODE;
> >     }
> >
> >     protected void writeString(String text) throws IOException {
> >         if ( text != null && text.length() > 0 ) {
> >             if ( ESCAPE_TEXT ) {
> >                 text = escapeElementEntities(text);
> >             }
> >
> >             if ( SUPPORT_PAD_TEXT ) {
> >                 if (lastOutputNodeType == Node.ELEMENT_NODE) {
> >                     String padText = getPadText();
> >                     if ( padText != null ) {
> >                         writer.write(padText);
> >                     }
> >                 }
> >             }
> >
> >             if (format.isTrimText()) {
> >                 boolean first = true;
> >                 StringTokenizer tokenizer = new StringTokenizer(text);
> >                 while (tokenizer.hasMoreTokens()) {
> >                     String token = tokenizer.nextToken();
> >                     if ( first ) {
> >                         first = false;
> >                         if ( lastOutputNodeType == Node.TEXT_NODE )
> 
> >                             writer.write(" ");
> >                         }
> >                     }
> >                     else {
> >                         writer.write(" ");
> >                     }
> >                     writer.write(token);
> >                     lastOutputNodeType = Node.TEXT_NODE;
> >                 }
> >             }
> >             else
> 
> >                 lastOutputNodeType = Node.TEXT_NODE;
> >                 writer.write(text);
> >             }
> >         }
> >     }
> >
> >
> >     protected void writeNode(Node node) throws IOException {
> >         int nodeType = node.getNodeType();
> >         switch (nodeType) {
> >             case Node.ELEMENT_NODE:
> >                 writeElement((Element) node);
> >                 break;
> >             case Node.ATTRIBUTE_NODE:
> >                 writeAttribute((Attribute) node);
> >                 break;
> >             case Node.TEXT_NODE:
> >                 writeString(node.getText());
> >                 file://write((Text) node);
> >                 break;
> >             case Node.CDATA_SECTION_NODE:
> >                 writeCDATA(node.getText());
> >                 break;
> >             case Node.ENTITY_REFERENCE_NODE:
> >                 writeEntity((Entity) node);
> >                 break;
> >             case Node.PROCESSING_INSTRUCTION_NODE:
> >                 writeProcessingInstruction((ProcessingInstruction) node);
> >                 break;
> >             case Node.COMMENT_NODE:
> >                 writeComment(node.getText());
> >                 break;
> >             case Node.DOCUMENT_NODE:
> >                 write((Document) node);
> >                 break;
> >             case Node.DOCUMENT_TYPE_NODE:
> >                 writeDocType((DocumentType) node);
> >                 break;
> >             case Node.NAMESPACE_NODE:
> >                 // Will be output with attributes
> >                 file://write((Namespace) node);
> >                 break;
> >             default:
> >                 throw new IOException( "Invalid node type: " + node );
> >         }
> >     }
> >
> >
> >
> >
> >     protected void installLexicalHandler() {
> >         XMLReader parent = getParent();
> >         if (parent == null) {
> >             throw new NullPointerException("No parent for filter");
> >         }
> >         // try to register for lexical events
> >         for (int i = 0; i < LEXICAL_HANDLER_NAMES.length; i++) {
> >             try {
> >                 parent.setProperty(LEXICAL_HANDLER_NAMES[i], this);
> >                 break;
> >             }
> >             catch (SAXNotRecognizedException ex) {
> >                 // ignore
> >             }
> >             catch (SAXNotSupportedException ex) {
> >                 // ignore
> >             }
> >         }
> >     }
> >
> >     protected void writeDocType(String name, String publicID, String
> systemID) throws IOException {
> >         boolean hasPublic = false;
> >
> >         writer.write("<!DOCTYPE ");
> >         writer.write(name);
> >         if ((publicID != null) && (!publicID.equals(""))) {
> >             writer.write(" PUBLIC \"");
> >             writer.write(publicID);
> >             writer.write("\"");
> >             hasPublic = true;
> >         }
> >         if ((systemID != null) && (!systemID.equals(""))) {
> >             if (!hasPublic) {
> >                 writer.write(" SYSTEM");
> >             }
> >             writer.write(" \"");
> >             writer.write(systemID);
> >             writer.write("\"");
> >         }
> >         writer.write(">");
> >         writePrintln();
> >     }
> >
> >     protected void writeEntity(Entity entity) throws IOException {
> >         writeEntityRef( entity.getName() );
> >     }
> >
> >     protected void writeEntityRef(String name) throws IOException {
> >         writer.write( "&" );
> >         writer.write( name );
> >         writer.write( ";" );
> >
> >         lastOutputNodeType = Node.ENTITY_REFERENCE_NODE;
> >     }
> >
> >     protected void writeComment(String text) throws IOException
> 
> >         if (format.isNewlines()) {
> >             if ( lastOutputNodeType != Node.COMMENT_NODE ) {
> >                 println();
> >             }
> >             indent();
> >         }
> >         writer.write( "<!--" );
> >         writer.write( text );
> >         writer.write( "-->" );
> >
> >         writePrintln();
> >
> >         lastOutputNodeType = Node.COMMENT_NODE;
> >     }
> >
> >     /** Writes the attributes of the given element
> >       *
> >       */
> >     protected void writeAttributes( Element element ) throws IOException {
> >
> >         // I do not yet handle the case where the same prefix maps to
> >         // two different URIs. For attributes on the same element
> >         // this is illegal; but as yet we don't throw an exception
> >         // if someone tries to do this
> >         for ( int i = 0, size = element.attributeCount(); i < size; i++ )
> {
> >             Attribute attribute = element.attribute(i);
> >             Namespace ns = attribute.getNamespace();
> >             if (ns != null && ns != Namespace.NO_NAMESPACE && ns !=
> Namespace.XML_NAMESPACE) {
> >                 String prefix = ns.getPrefix();
> >                 String uri = namespaceStack.getURI(prefix);
> >                 if (!ns.getURI().equals(uri)) { // output a new namespace
> declaration
> >                     writeNamespace(ns);
> >                     namespaceStack.push(ns);
> >                 }
> >             }
> >
> >             writer.write(" ");
> >             writer.write(attribute.getQualifiedName());
> >             writer.write("=\"");
> >             writeEscapeAttributeEntities(attribute.getValue());
> >             writer.write("\"");
> >         }
> >     }
> >
> >     protected void writeAttribute(Attribute attribute) throws IOException
> {        
> >         writer.write(" ");
> >         writer.write(attribute.getQualifiedName());
> >         writer.write("=");
> > 
> >         writer.write("\"");
> >         
> >         writeEscapeAttributeEntities(attribute.getValue());
> >         
> >         writer.write("\"");
> >         lastOutputNodeType = Node.ATTRIBUTE_NODE;
> >     }
> > 
> >     protected void writeAttributes(Attributes attributes) throws IOException {
> >         for (int i = 0, size = attributes.getLength(); i < size; i++) {
> >             writeAttribute( attributes, i );
> >         }
> >     }
> > 
> >     protected void writeAttribute(Attributes attributes, int index) throws 
>IOException {       
> >         writer.write(" ");
> >         writer.write(attributes.getQName(index));
> >         writer.write("=\"");        
> >         writeEscapeAttributeEntities(attributes.getValue(index));
> >         writer.write("\"");
> >     }
> > 
> >     
> >     
> >     protected void indent() throws IOException {
> >         String indent = format.getIndent();
> >         if ( indent != null && indent.length() > 0 ) {
> >             for ( int i = 0; i < indentLevel; i++ ) {
> >   
>               writer.write(indent);
> >             }
> >         }
> >     }
> >
> >     /**
> >      * <p>
> >      * This will print a new line only if the newlines flag was set to
> true
> >      * </p>
> >      *
> >      * @param out <code>Writer</code> to write to
> >      */
> >     protected void writePrintln() throws IOException  {
> >         if (format.isNewlines()) {
> >             writer.write( format.getLineSeparator() );
> >         }
> >     }
> >
> >     /**
> >      * Get an OutputStreamWriter, use preferred encoding.
> >      */
> >     protected Writer createWriter(OutputStream outStream, String encoding)
> throws UnsupportedEncodingException {
> >         return new
> feredWriter( 
> >             new OutputStreamWriter( outStream, encoding )
> >         );
> >     }
> > 
> >     /**
> >      * <p>
> >      * This will write the declaration to the given Writer.
> >      *   Assumes XML version 1.0 since we don't directly know.
> >      * </p>
> >      */
> >     protected void writeDeclaration() throws IOException {
> >         String encoding = format.getEncoding();
> >         
> >         // Only print of declaration is not suppressed
> >         if (! format.isSuppressDeclaration()) {
> >             // Assume 1.0 version
> >             if (encoding.equals("UTF8")) {
> >                 writer.write("<?xml version=\"1.0\"");
> >                 if (!format.isOmitEncoding()) {
> >                     writer.write(" encoding=\"UTF-8\"");
> >                 }
> >                 writer.write("?>");
> >             } else {
> >                 writer.write("<?xml version=\"1.0\"");
> >                 if (! format.isOmitEncoding()) {
> >                     writer.write(" encoding=\"" + encoding + "\"");
> >                 }
> >                 writer.write("?>");
> >             }
> >             println();
> >         }        
> >     }    
> > 
> >     protected void writeClose(String qualifiedName) throws IOException {
> >         writer.write("</");
> >         writer.write(qualifiedName);
> >         writer.write(">");
> >     }
> > 
> > 
>     protected void writeEmptyElementClose(String qualifiedName) throws
> IOException {
> >         // Simply close up
> >         if (! isExpandEmptyElements()) {
> >             writer.write("/>");
> >         } else {
> >             writer.write("></");
> >             writer.write(qualifiedName);
> >             writer.write(">");
> >         }
> >     }
> >
> >     protected boolean isExpandEmptyElements() {
> >         return format.isExpandEmptyElements();
> >     }
> >
> >
> >     /** This will take the pre-defined entities in XML 1.0 and
> >       * convert their character representation to the appropriate
> >       * entity reference, suitable for XML attributes.
> >       */
> >     protected String escapeElementEntities(String text) {
> >         char[] block = null;
> >         int i, last = 0, size = text.length();
> >         for ( i = 0; i < size; i++ ) {
> >             String entity = null;
> >             char c;     // declaration and assignment added by Dan Jacobs
> >             switch( c = text.charAt(i) ) {
> >                 case '<' :
> >                     entity = "&lt;";
> >                     break;
> >                 case '>' :
> >                     entity = "&gt;";
> >                     break;
> >                 case '&' :
> >                     entity = "&amp;";
> >                     break;
> >
> >                 file://!!! Begin code added by Dan Jacobs !!!//
> >                 case '\t': case '\n': case '\r':
> >                     // don't encode standard whitespace characters
> >                     break;
> >                 default:
> >                     // encode low and high characters as entities
> >                     if ((c < 32) || (c >= 127))
> >                         entity = "&#" + (int)c + ";";
> >                     break;
> >                 file://!!! End code added by Dan Jacobs !!!//
> >             }
> >             if (entity != null) {
> >                 if ( block == null ) {
> >                     block = text.toCharArray();
> >                 }
> >                 buffer.append(block, last, i - last);
> >                 buffer.append(entity);
> >                 last = i + 1;
> >             }
> >         }
> >         if ( last == 0 ) {
> >             return text;
> >         }
> >         if ( last < size ) {
> >             if ( block == null ) {
> >                 block = text.toCharArray();
> >             }
> >             buffer.append(block, last, i - last);
> >         }
> >         String answer = buffer.toString();
> >         buffer.setLength(0);
> >         return answer;
> >     }
> >
> >     protected void writeEscapeAttributeEntities(String text) throws
> IOException {
> >         if ( text != null ) {
> >             String escapedText = escapeAttributeEntities( text );
> >             writer.write( escapedText );
> >         }
> >     }
> >     /** This will take the pre-defined entities in XML 1.0 and
> >       * convert their character representation to the appropriate
> >       * entity reference, suitable for XML attributes.
> >       */
> >     protected String escapeAttributeEntities(String text) {
> >         char[] block = null;
> >         int i, last = 0, size = text.length();
> >         for ( i = 0; i < size; i++ ) {
> >             String entity = null;
> >             char c;     // declaration and assignment added by Dan Jacobs
> >             switch( c = text.charAt(i) ) {
> >                 case '<' :
> >                     entity = "&lt;";
> >                     break;
> >                 case '>' :
> >                     entity = "&gt;";
> >                     break;
> >                 case '\'' :
> >                     entity = "&apos;";
> >                     break;
> >                 case '\"' :
> >                     entity = "&quot;";
> >                     break;
> >                 case '&' :
> >                     entity = "&amp;";
> >                     break;
> >
> >                 file://!!! Begin code added by Dan Jacobs !!!//
> >                 case '\t': case '\n': case '\r':
> >                     // don't encode standard whitespace characters
> >                     break;
> >                 default:
> >                     // encode low and high characters as entities
> >                     if ((c < 32) || (c >= 127))
> >                         entity = "&#" + (int)c + ";";
> >                     break;
> >                 file://!!! End code added by Dan Jacobs !!!//
> >             }
> >             if (entity != null) {
> >                 if ( block == null ) {
> >                     block = text.toCharArray();
> >                 }
> >                 buffer.append(block, last, i - last);
> >                 buffer.append(entity);
> >                 last = i + 1;
> >             }
> >         }
> >         if ( last == 0 ) {
> >             return text;
> >         }
> >         if ( last < size ) {
> >             if ( block == null ) {
> >                 block = text.toCharArray();
> >             }
> >             buffer.append(block, last, i - last);
> >         }
> >         String answer = buffer.toString();
> >         buffer.setLength(0);
> >         return answer;
> >     }
> >
> >     protected boolean isNamespaceDeclaration( Namespace ns ) {
> >         if (ns != null && ns != Namespace.NO_NAMESPACE && ns !=
> Namespace.XML_NAMESPACE) {
> >             String uri = ns.getURI();
> >             if ( uri != null && uri.length() > 0 ) {
> >                 if ( ! namespaceStack.contains( ns ) ) {
> >                     return true;
> >
> >                 }
> >             }
> >         }
> >         return false;
> >     }
> >
> >     protected void handleException(IOException e) throws SAXException {
> >         throw new SAXException(e);
> >     }
> >
> >     protected String getPadText() {
> >         return null;
> >     }
> > }
> >
> >
> >
> >
> > /*
> >  * Redistribution and use of this software and associated documentation
> >  * ("Software"), with or without modification, are permitted provided
> >  * that the following conditions are met:
> >  *
> >  * 1. Redistributions of source code must retain copyright
> >  *    statements and notices.  Redistributions must also contain a
> >  *    copy of this document.
> >  *
> >  * 2. Redistributions in binary form must reproduce the
> >  *    above copyright notice, this list of conditions and the
> >  *    following disclaimer in the documentation and/or other
> >  *    materials provided with the distribution.
> >  *
> >  * 3. The name "DOM4J" must not be used to endorse or promote
> >  *    products derived from this Software without prior written
> >  *    permission of MetaStuff, Ltd.  For written permission,
> >  *    please contact [EMAIL PROTECTED]
> >  *
> >  * 4. Products derived from this Software may not be called "DOM4J"
> >  *    nor may "DOM4J" appear in their names without prior written
> >  *    permission of MetaStuff, Ltd. DOM4J is a registered
> >  *    trademark of MetaStuff, Ltd.
> >  *
> >  * 5. Due credit should be given to the DOM4J Project
> >  *    (http://dom4j.org/).
> >  *
> >  * THIS SOFTWARE IS PROVIDED BY METASTUFF, LTD. AND CONTRIBUTORS
> >  * ``AS IS'' AND ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT
> >  * NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
> >  * FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN NO EVENT SHALL
> >  * METASTUFF, LTD. OR ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT,
> >  * INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
> >  * (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
> >  * SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
> >  * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT,
> >  * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
> >  * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED
> >  * OF THE POSSIBILITY OF SUCH DAMAGE.
> >  *
> >  * Copyright 2001 (C) MetaStuff, Ltd. All Rights Reserved.
> >  *
> >  * $Id: XMLWriter.java,v 1.46 2002/02/14 11:55:46 jstrachan Exp $
> >  */
> >
> 
> 
> 
> -------------------------------------------------------
> Sponsored by:
> ThinkGeek at http://www.ThinkGeek.com/
> _______________________________________________
> dom4j-dev mailing list
> [EMAIL PROTECTED]
> https://lists.sourceforge.net/lists/listinfo/dom4j-dev
-- 
Amelia A. Lewis       [EMAIL PROTECTED]      [EMAIL PROTECTED]
Belief is the wound that knowledge heals, 
  and death begins the Telling of our life.
                -- Teran Penan [Ursula K. Le Guin, "The Telling"]

signature.asc
Description: This is a digitally signed message part

Re: [dom4j-dev] fix for writing numeric entities in XMLWriter

Reply via email to