Finally I think I've fixed the character escaping issue once and for all.
The default behaviour is for characters > 32 that are not special XML
characters (>, <, & etc) are not escaped, unlessy you are outputting the XML
using US-ASCII encoding.

If you want to customize the behaviour of whether characters get escaped or
not you can derive from XMLWriter and overload the method

    protected boolean shouldEncodeChar(char c)

which by default uses the property getMaximumAllowedCharacter(), which
auto-defaults itself to 127 if the output encoding is US-ASCII otherwise it
defaults to -1 for no-encoding. So if you wanted to encode all characters
over 255 you could do

    XMLWriter writer = new XMLWriter( new FileWriter("foo.xml"));
    writer.setMaximumAllowedCharacter(255);
    writer.write(document);
    writer.close();

If anyone wants to try out this build before the 1.4 release goes out,
please try out CVS or you can download a prebuilt dom4j JAR from here...

http://www.ibiblio.org/maven/dom4j/jars/

with version dom4j-1.4-dev-8.jar. If you're using Maven to build your
project you can just change your dependency version number to 1.4-dev-8 and
you should get the new version.

I'd like to get a real 1.4 release out ASAP so any early feedback will be
greatly appreciated.

James
-------
http://radio.weblogs.com/0112098/

----- Original Message -----
From: "James Elson" <[EMAIL PROTECTED]>
To: "James Strachan" <[EMAIL PROTECTED]>
Cc: "Dan Jacobs" <[EMAIL PROTECTED]>;
<[EMAIL PROTECTED]>
Sent: Friday, July 26, 2002 11:03 AM
Subject: Re: [dom4j-dev] fix for writing numeric entities in XMLWriter


> Shouldn't this depend on the charset of the document? It might be
perfectly
> valid (and desirable) to have the characters output as normal characters.
Using
> this approach (everything above a numeric values is an entity) will play
havoc
> with the readability of foreign language documents.
>
> Admittedly, asXML is a "quick way" of getting an XML representation. Maybe
> dom4j's formatting classes should allow us to register a list of
characters be
> rendered as entities.
>
>    James
>
> Quoting James Strachan <[EMAIL PROTECTED]>:
>
> > Hi Dan
> >
> > From: "Dan Jacobs" <[EMAIL PROTECTED]>
> > > I found and fixed a simple bug in XMLWriter.java.  A fixed version is
> > > attached, but not checked into the source repository.  I'll leave that
> > > for the folks who maintain the sources.
> > >
> > > If you parse and print out (using asXML) an XML document containing a
> > > numeric entity reference such as &#160; (the code for &nbsp;) the code
> > > was just writing out a byte with a value of 160 (decimal).  The fix
> > > (abstracted below) is to check for characters with integer codes below
> > > 32 and above 126 (excluding standard whitespace characters) and encode
> > > them as numeric entities.  The same fix is made in two places in
> > > XMLWriter.java.
> > >
> > > Thanks for reading.
> > > -- Dan Jacobs
> > >
> > >             char c;     // declaration and assignment added by Dan
Jacobs
> > >             switch( c = text.charAt(i) ) {
> > >                 case '<' :
> > >                     entity = "&lt;";
> > >                     break;
> > >                 case '>' :
> > >                     entity = "&gt;";
> > >                     break;
> > >                 case '&' :
> > >                     entity = "&amp;";
> > >                     break;
> > >
> > >                 //!!! Begin code added by Dan Jacobs !!!//
> > >                 case '\t': case '\n': case '\r':
> > >                     // don't encode standard whitespace characters
> > >                     break;
> > >                 default:
> > >                     // encode low and high characters as entities
> > >                     if ((c < 32) || (c >= 127))
> > >                         entity = "&#" + (int)c + ";";
> > >                     break;
> > >                 //!!! End code added by Dan Jacobs !!!//
> > >             }
> > >
> > > --
> >
> > Sorry this took so long to spot; this mail got a bit lost in my email
client
> > for some wierd reason.
> >
> > I've applied your patch now - many thanks. Its checked into CVS now.
Also if
> > anyone wants a pre-release build of dom4j with the patch applied you can
> > download it from here:-
> >
> > http://jakarta.apache.org/turbine/jars2/dom4j/jars/
> >
> > the latest build is dom4j-1.4-dev-6.jar
> >
> > I'll hopefully get the 1.4 release build working ASAP (am porting the
build
> > system to Maven).
> >
> > James
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Everything you'll ever need on one web page
> > from News and Sport to Email and Music Charts
> > http://uk.my.yahoo.com
> >
> >
> > -------------------------------------------------------
> > This sf.net email is sponsored by:ThinkGeek
> > Welcome to geek heaven.
> > http://thinkgeek.com/sf
> > _______________________________________________
> > dom4j-dev mailing list
> > [EMAIL PROTECTED]
> > https://lists.sourceforge.net/lists/listinfo/dom4j-dev
> >
>
>

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com


-------------------------------------------------------
This sf.net email is sponsored by: Are you worried about 
your web server security? Click here for a FREE Thawte 
Apache SSL Guide and answer your Apache SSL security 
needs: http://www.gothawte.com/rd523.html
_______________________________________________
dom4j-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/dom4j-dev

Reply via email to