Hi!

Try to parse html string with english, russian and vietnamese characters.

Sample:

Document doc = builder.parse(new StringBufferInputStream("<html><body>Eng Рус Việt Nam</body></html>"));

Java file stored as UTF-8
I even check string "Eng Рус Việt Nam" with online convert service - result: input string encoding same as output - utf8

Java Appliction Exception at parse proc:

com.sun.org.apache.xerces.internal.impl.io.MalformedByteSequenceException: Invalid byte 2 of 2-byte UTF-8 sequence. at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.invalidByte(UTF8Reader.java:691) at com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:372) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.load(XMLEntityScanner.java:1743) at com.sun.org.apache.xerces.internal.impl.XMLEntityScanner.skipChar(XMLEntityScanner.java:1413) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2823) at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:606) at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:510) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:848) at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:777) at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141) at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(DOMParser.java:243) at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:348) at javax.xml.parsers.DocumentBuilder.parse(DocumentBuilder.java:121)
        at cc.jmitty.PowerWorker.doProceedPDFRequest(PowerWorker.java:268)
        at cc.jmitty.PowerWorker.doSendPDF(PowerWorker.java:187)
        at cc.jmitty.PowerWorker.run(PowerWorker.java:93)


Have you any idea how to check my string or another solution?

Dmitry

Reply via email to