Bugs item #1003141, was opened at 2004-08-04 02:50 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1003141&group_id=16035
Category: None Group: None Status: Closed Resolution: Fixed Priority: 5 Submitted By: Zhang Tao (robintj) Assigned to: Maarten Coene (maartenc) Summary: SAXReader.read(File file) character encoding problem Initial Comment: In the SAXReader.read(File file) function, the code is: return read( new InputSource(new FileReader(file)) ); But the FileReader Class says: Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream. FileReader is meant for reading streams of characters. For reading streams of raw bytes, consider using a FileInputStream. It means FileReader only use the "default character encoding". And when I program use this code: File f = new File(fname); SAXReader reader = new SAXReader(); Document doc = reader.read(f); It cannot read correct Chinese character from XML File that uses UTF-8 encoding (and my system default character encoding is zh_CN.GBK). But if I change the code to: Document doc = reader.read(new FileInputStream(f)); All is OK. So I advice SAXReader.read(File file) function changed to: return read( new InputSource(new FileInputStream(file)) ); ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2004-10-08 08:40 Message: Logged In: NO Hmm, I have a problem with this... I have a string res witch contains a complete document. (including, encoding=iso-8859-1 header). If I Document receive = DocumentHelper.parseText(res); several characters inside the document are corrupted, and the encoding is set to UTF-8 despite the fact that the string res already contains the correct encoding. If I write the same string to a file it looks fine, but it fails to do a valid read of the file again: FileWriter fw = new FileWriter( new java.io.File("C:/temp/ resultat.xml")); fw.write(res); fw.flush(); fw.close(); SAXReader saxreader = new SAXReader(); saxreader.setXMLReaderClassName("org.apache.xerces. parsers.SAXParser"); Document receive = saxreader.read("C:/temp/resultat.xml"); it show's up as an UTF-8 encoded document, with some of the characters corrupted. What on earth is wrong?! ---------------------------------------------------------------------- Comment By: Maarten Coene (maartenc) Date: 2004-08-04 03:07 Message: Logged In: YES user_id=178745 This has already been fixed in dom4j 1.5 ! thanks for the report Maarten ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=116035&aid=1003141&group_id=16035 ------------------------------------------------------- This SF.net email is sponsored by: IT Product Guide on ITManagersJournal Use IT products in your business? Tell us what you think of them. Give us Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more http://productguide.itmanagersjournal.com/guidepromo.tmpl _______________________________________________ dom4j-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/dom4j-dev