Your file isn't correct xml. Use an UTF-8 aware editor and things should work out fine.
With UTF-8, the latin alfabet characters are encoded as one byte whereas some other, for example åäö are encoded with two bytes. These two bytes are encoded in such a fashion that when the parser encounters the first it knows that the second should also be counted as the same character. When an illegal byte group is encountered, this problem occurs. You most propably have used an editor that produces non-UTF-8 byte groups. Regards Erik -----Original Message----- From: [EMAIL PROTECTED] To: dom4j-user@lists.sourceforge.net Sent: 2005-09-22 23:29 Subject: [dom4j-user] Invalid byte 2 of 3-byte UTF-8 sequence Hi, i have a problem with UTF-8. I want to add special characters, like german umlauts: äöü or french characters: e.g. é in my XML-File. I can add the characters to an xml-file, but when i try to read thegenerated file with the SAXParser, an error will occure (Invalid byte 2of 3-byte UTF-8 sequence) - see at the end of this message. The XML-File looks fine in my editor (UltraEdit), but my System.out-Trace show this: äöü for äöü. With ISO-8859-1 everything works fine. Where is my mistake? I searched the dom4j faq, cookbook and internet for this problem, but the only thing i found was this: http://sourceforge.net/mailarchive/message.php?msg_id=10356047 It didn't help... Thanks for any advice! Udo Krass My code: -------<snip>------- import java.io.File; import java.io.FileWriter; import java.io.IOException; import org.dom4j.Document; import org.dom4j.DocumentException; import org.dom4j.DocumentHelper; import org.dom4j.Element; import org.dom4j.io.OutputFormat; import org.dom4j.io.SAXReader; import org.dom4j.io.XMLWriter; import org.xml.sax.SAXException; public class Test { public static void main(String[] args) { File theFile = new File("C:/t.xml"); File testOutputFile = new File("C:/t.xml"); /*try { parse(theFile); } catch (DocumentException e) { // TODO Auto-generated catch block e.printStackTrace(); } */ Document document; document = DocumentHelper.createDocument(); Element root = document.addElement( "go" ); document.getRootElement().add(DocumentHelper.createText("äöü")); try { writeToFile(document,testOutputFile,true); } catch (IOException e) { e.printStackTrace(); } } public static Document parse(File theFile) throws DocumentException { SAXReader reader = new SAXReader(); Document document = null; document = reader.read(theFile); return document; } public static void writeToFile(Document theDocument, File theOutputFile, Boolean trace) throws IOException { //lets write to a file OutputFormat format = OutputFormat.createCompactFormat(); format.setEncoding("UTF-8"); format.setNewlines(true); format.setIndentSize(2); format.setTrimText(false); XMLWriter xmlWriter = new XMLWriter(new FileWriter(theOutputFile), format); xmlWriter.write(theDocument); xmlWriter.flush(); xmlWriter.close(); if (trace) { // print the document to System.out xmlWriter = new XMLWriter(System.out, format); format.setEncoding("UTF-8"); xmlWriter.write(theDocument); xmlWriter.flush(); xmlWriter.close(); } } } -------<snap>------- the generated xml-File: -------<snip>------- <?xml version="1.0" encoding="UTF-8"?> <go>äöü</go> -------<snap>------- this is the System.out. output, when i uncomment the parse section and read the file with the SAXParser: -------<snip>------- org.dom4j.DocumentException: Error on line 3 of documentfile:///C:/t.xml : Invalid byte 2 of 3-byte UTF-8 sequence. Nestedexception: Invalid byte 2 of 3-byte UTF-8 sequence. at org.dom4j.io.SAXReader.read(SAXReader.java:350) at org.dom4j.io.SAXReader.read(SAXReader.java:222) at Test.parse(Test.java:51) at Test.main(Test.java:25) Nested exception: org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence. atcom.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXP arseException(ErrorHandlerWrapper.java:236) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(E rrorHandlerWrapper.java:215) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XML ErrorReporter.java:386) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XML ErrorReporter.java:316) atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl $FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java: 1810) atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl .scanDocument(XMLDocumentFragmentScannerImpl.java:368) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML1 1Configuration.java:834) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML1 1Configuration.java:764) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.jav a:148) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abstr actSAXParser.java:1242) at org.dom4j.io.SAXReader.read(SAXReader.java:334) at org.dom4j.io.SAXReader.read(SAXReader.java:222) at Test.parse(Test.java:51) at Test.main(Test.java:25) Nested exception: org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence. atcom.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXP arseException(ErrorHandlerWrapper.java:236) at com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(E rrorHandlerWrapper.java:215) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XML ErrorReporter.java:386) at com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XML ErrorReporter.java:316) atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl $FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java: 1810)<?xmlversion="1.0" encoding="UTF-8"?> <go>äöü</go> atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl .scanDocument(XMLDocumentFragmentScannerImpl.java:368) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML1 1Configuration.java:834) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML1 1Configuration.java:764) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.jav a:148) at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(Abstr actSAXParser.java:1242) at org.dom4j.io.SAXReader.read(SAXReader.java:334) at org.dom4j.io.SAXReader.read(SAXReader.java:222) at Test.parse(Test.java:51) at Test.main(Test.java:25) -------<snap>------- ------------------------------------------------------- This SF.Net email is sponsored by: Power Architecture Resource Center: Free content, downloads, discussions, and more. http://solutions.newsforge.com/ibmarch.tmpl _______________________________________________ dom4j-user mailing list dom4j-user@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/dom4j-user