Hi, i have a problem with UTF-8. I want to add special characters, like german umlauts: äöü or french characters: e.g. é in my XML-File. I can add the characters to an xml-file, but when i try to read thegenerated file with the SAXParser, an error will occure (Invalid byte 2of 3-byte UTF-8 sequence) - see at the end of this message. The XML-File looks fine in my editor (UltraEdit), but my System.out-Trace show this: äöü for äöü. With ISO-8859-1 everything works fine. Where is my mistake?
I searched the dom4j faq, cookbook and internet for this problem, but the only thing i found was this: http://sourceforge.net/mailarchive/message.php?msg_id=10356047 It didn't help... Thanks for any advice! Udo Krass My code: -------<snip>------- import java.io.File; import java.io.FileWriter; import java.io.IOException; import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.DocumentHelper; import org.dom4j.Element; import org.dom4j.io.OutputFormat; import org.dom4j.io.SAXReader; import org.dom4j.io.XMLWriter; import org.xml.sax.SAXException; public class Test { public static void main(String[] args) { File theFile = new File("C:/t.xml"); File testOutputFile = new File("C:/t.xml"); /*try { parse(theFile); } catch (DocumentException e) { // TODO Auto-generated catch block e.printStackTrace(); } */ Document document; document = DocumentHelper.createDocument(); Element root = document.addElement( "go" ); document.getRootElement().add(DocumentHelper.createText("äöü")); try { writeToFile(document,testOutputFile,true); } catch (IOException e) { e.printStackTrace(); } } public static Document parse(File theFile) throws DocumentException { SAXReader reader = new SAXReader(); Document document = null; document = reader.read(theFile); return document; } public static void writeToFile(Document theDocument, File theOutputFile, Boolean trace) throws IOException { //lets write to a file OutputFormat format = OutputFormat.createCompactFormat(); format.setEncoding("UTF-8"); format.setNewlines(true); format.setIndentSize(2); format.setTrimText(false); XMLWriter xmlWriter = new XMLWriter(new FileWriter(theOutputFile), format); xmlWriter.write(theDocument); xmlWriter.flush(); xmlWriter.close(); if (trace) { // print the document to System.out xmlWriter = new XMLWriter(System.out, format); format.setEncoding("UTF-8"); xmlWriter.write(theDocument); xmlWriter.flush(); xmlWriter.close(); } } } -------<snap>------- the generated xml-File: -------<snip>------- <?xml version="1.0" encoding="UTF-8"?> <go>äöü</go> -------<snap>------- this is the System.out. output, when i uncomment the parse section and read the file with the SAXParser: -------<snip>------- org.dom4j.DocumentException: Error on line 3 of documentfile:///C:/t.xml : Invalid byte 2 of 3-byte UTF-8 sequence. Nestedexception: Invalid byte 2 of 3-byte UTF-8 sequence. at org.dom4j.io.SAXReader.read(SAXReader.java:350) at org.dom4j.io.SAXReader.read(SAXReader.java:222) at Test.parse(Test.java:51) at Test.main(Test.java:25) Nested exception: org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence. atcom.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316) atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1810) atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242) at org.dom4j.io.SAXReader.read(SAXReader.java:334) at org.dom4j.io.SAXReader.read(SAXReader.java:222) at Test.parse(Test.java:51) at Test.main(Test.java:25) Nested exception: org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence. atcom.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316) atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1810)<?xmlversion="1.0" encoding="UTF-8"?> <go>äöü</go> atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242) at org.dom4j.io.SAXReader.read(SAXReader.java:334) at org.dom4j.io.SAXReader.read(SAXReader.java:222) at Test.parse(Test.java:51) at Test.main(Test.java:25) -------<snap>-------