Finally I think I've fixed the character escaping issue once and for all. The default behaviour is for characters > 32 that are not special XML characters (>, <, & etc) are not escaped, unlessy you are outputting the XML using US-ASCII encoding.
If you want to customize the behaviour of whether characters get escaped or not you can derive from XMLWriter and overload the method protected boolean shouldEncodeChar(char c) which by default uses the property getMaximumAllowedCharacter(), which auto-defaults itself to 127 if the output encoding is US-ASCII otherwise it defaults to -1 for no-encoding. So if you wanted to encode all characters over 255 you could do XMLWriter writer = new XMLWriter( new FileWriter("foo.xml")); writer.setMaximumAllowedCharacter(255); writer.write(document); writer.close(); If anyone wants to try out this build before the 1.4 release goes out, please try out CVS or you can download a prebuilt dom4j JAR from here... http://www.ibiblio.org/maven/dom4j/jars/ with version dom4j-1.4-dev-8.jar. If you're using Maven to build your project you can just change your dependency version number to 1.4-dev-8 and you should get the new version. I'd like to get a real 1.4 release out ASAP so any early feedback will be greatly appreciated. James ------- http://radio.weblogs.com/0112098/ ----- Original Message ----- From: "James Elson" <[EMAIL PROTECTED]> To: "James Strachan" <[EMAIL PROTECTED]> Cc: "Dan Jacobs" <[EMAIL PROTECTED]>; <[EMAIL PROTECTED]> Sent: Friday, July 26, 2002 11:03 AM Subject: Re: [dom4j-dev] fix for writing numeric entities in XMLWriter > Shouldn't this depend on the charset of the document? It might be perfectly > valid (and desirable) to have the characters output as normal characters. Using > this approach (everything above a numeric values is an entity) will play havoc > with the readability of foreign language documents. > > Admittedly, asXML is a "quick way" of getting an XML representation. Maybe > dom4j's formatting classes should allow us to register a list of characters be > rendered as entities. > > James > > Quoting James Strachan <[EMAIL PROTECTED]>: > > > Hi Dan > > > > From: "Dan Jacobs" <[EMAIL PROTECTED]> > > > I found and fixed a simple bug in XMLWriter.java. A fixed version is > > > attached, but not checked into the source repository. I'll leave that > > > for the folks who maintain the sources. > > > > > > If you parse and print out (using asXML) an XML document containing a > > > numeric entity reference such as   (the code for ) the code > > > was just writing out a byte with a value of 160 (decimal). The fix > > > (abstracted below) is to check for characters with integer codes below > > > 32 and above 126 (excluding standard whitespace characters) and encode > > > them as numeric entities. The same fix is made in two places in > > > XMLWriter.java. > > > > > > Thanks for reading. > > > -- Dan Jacobs > > > > > > char c; // declaration and assignment added by Dan Jacobs > > > switch( c = text.charAt(i) ) { > > > case '<' : > > > entity = "<"; > > > break; > > > case '>' : > > > entity = ">"; > > > break; > > > case '&' : > > > entity = "&"; > > > break; > > > > > > //!!! Begin code added by Dan Jacobs !!!// > > > case '\t': case '\n': case '\r': > > > // don't encode standard whitespace characters > > > break; > > > default: > > > // encode low and high characters as entities > > > if ((c < 32) || (c >= 127)) > > > entity = "&#" + (int)c + ";"; > > > break; > > > //!!! End code added by Dan Jacobs !!!// > > > } > > > > > > -- > > > > Sorry this took so long to spot; this mail got a bit lost in my email client > > for some wierd reason. > > > > I've applied your patch now - many thanks. Its checked into CVS now. Also if > > anyone wants a pre-release build of dom4j with the patch applied you can > > download it from here:- > > > > http://jakarta.apache.org/turbine/jars2/dom4j/jars/ > > > > the latest build is dom4j-1.4-dev-6.jar > > > > I'll hopefully get the 1.4 release build working ASAP (am porting the build > > system to Maven). > > > > James > > > > __________________________________________________ > > Do You Yahoo!? > > Everything you'll ever need on one web page > > from News and Sport to Email and Music Charts > > http://uk.my.yahoo.com > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by:ThinkGeek > > Welcome to geek heaven. > > http://thinkgeek.com/sf > > _______________________________________________ > > dom4j-dev mailing list > > [EMAIL PROTECTED] > > https://lists.sourceforge.net/lists/listinfo/dom4j-dev > > > > __________________________________________________ Do You Yahoo!? Everything you'll ever need on one web page from News and Sport to Email and Music Charts http://uk.my.yahoo.com ------------------------------------------------------- This sf.net email is sponsored by: Are you worried about your web server security? Click here for a FREE Thawte Apache SSL Guide and answer your Apache SSL security needs: http://www.gothawte.com/rd523.html _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev