Hi,

i have a problem with UTF-8.
I want to add special characters, like german umlauts: äöü or french 
characters: e.g. é in my XML-File.
I can add the characters to an xml-file, but when i try to read thegenerated 
file with the SAXParser, an error will occure (Invalid byte 2of 3-byte UTF-8 
sequence) - see at the end of this message.
The XML-File looks fine in my editor (UltraEdit), but my System.out-Trace show 
this: äöü for äöü.
With ISO-8859-1 everything works fine.
Where is my mistake?

I searched the dom4j faq, cookbook and internet for this problem, but the only 
thing i found was this:
http://sourceforge.net/mailarchive/message.php?msg_id=10356047
It didn't help...

Thanks for any advice!

Udo Krass

My code:
-------<snip>-------
import java.io.File;
import java.io.FileWriter;
import java.io.IOException;

import org.dom4j.Document;
import org.dom4j.DocumentException;
import org.dom4j.DocumentHelper;
import org.dom4j.Element;
import org.dom4j.io.OutputFormat;
import org.dom4j.io.SAXReader;
import org.dom4j.io.XMLWriter;
import org.xml.sax.SAXException;

public class Test {

    public static void main(String[] args) {
        File theFile = new File("C:/t.xml");

        File testOutputFile = new File("C:/t.xml");
        /*try {
            parse(theFile);
        } catch (DocumentException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } */
       
        Document document;
        document = DocumentHelper.createDocument();
        Element root = document.addElement( "go" );
        document.getRootElement().add(DocumentHelper.createText("äöü"));
        
        try {
            writeToFile(document,testOutputFile,true);
        } catch (IOException e) {
            e.printStackTrace();
        }

    }
        public static Document parse(File theFile) throws DocumentException {
            SAXReader reader = new SAXReader();
            Document document = null;
            document = reader.read(theFile);
            return document;
        }
    public static void writeToFile(Document theDocument, File theOutputFile, 
Boolean trace) throws IOException
    {
        //lets write to a file
        OutputFormat format = OutputFormat.createCompactFormat();
        format.setEncoding("UTF-8");
        format.setNewlines(true);
        format.setIndentSize(2);
        format.setTrimText(false);
   
        XMLWriter xmlWriter = new XMLWriter(new FileWriter(theOutputFile), 
format);
                    xmlWriter.write(theDocument);
                   xmlWriter.flush();
                   
                   xmlWriter.close();

        if (trace) {
            // print the document to System.out
            xmlWriter = new XMLWriter(System.out, format);
            format.setEncoding("UTF-8");
            xmlWriter.write(theDocument);
            xmlWriter.flush();
            xmlWriter.close();
        }
    }
}
-------<snap>-------

the generated xml-File:
-------<snip>-------
<?xml version="1.0" encoding="UTF-8"?>

<go>äöü</go>
-------<snap>-------

this is the System.out. output, when i uncomment the parse section and read the 
file with the SAXParser:
-------<snip>-------
org.dom4j.DocumentException: Error on line 3 of documentfile:///C:/t.xml : 
Invalid byte 2 of 3-byte UTF-8 sequence. Nestedexception: Invalid byte 2 of 
3-byte UTF-8 sequence.
    at org.dom4j.io.SAXReader.read(SAXReader.java:350)
    at org.dom4j.io.SAXReader.read(SAXReader.java:222)
    at Test.parse(Test.java:51)
    at Test.main(Test.java:25)
Nested exception:
org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 sequence.
    
atcom.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
    at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
    at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
    at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
    
atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1810)
    
atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
    at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
    at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
    at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
    at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
    at org.dom4j.io.SAXReader.read(SAXReader.java:334)
    at org.dom4j.io.SAXReader.read(SAXReader.java:222)
    at Test.parse(Test.java:51)
    at Test.main(Test.java:25)
Nested exception: org.xml.sax.SAXParseException: Invalid byte 2 of 3-byte UTF-8 
sequence.
    
atcom.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.createSAXParseException(ErrorHandlerWrapper.java:236)
    at 
com.sun.org.apache.xerces.internal.util.ErrorHandlerWrapper.fatalError(ErrorHandlerWrapper.java:215)
    at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:386)
    at 
com.sun.org.apache.xerces.internal.impl.XMLErrorReporter.reportError(XMLErrorReporter.java:316)
    
atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(XMLDocumentFragmentScannerImpl.java:1810)<?xmlversion="1.0"
 encoding="UTF-8"?>

<go>äöü</go>

    
atcom.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:368)
    at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:834)
    at 
com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:764)
    at 
com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:148)
    at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1242)
    at org.dom4j.io.SAXReader.read(SAXReader.java:334)
    at org.dom4j.io.SAXReader.read(SAXReader.java:222)
    at Test.parse(Test.java:51)
    at Test.main(Test.java:25)
-------<snap>-------






Reply via email to