|
Hello all! I have an xml document which includes special characters,
for example, <Document> <one>melón</one> <two>1º</two> </Document> And I want to get it in canonical form, so I do the
following (using Apache XML Security and Xerces 2.7.1):
org.apache.xml.security.c14n.Canonicalizer c14n =
org.apache.xml.security.c14n.Canonicalizer.getInstance( org.apache.xml.security.transforms.Transforms.TRANSFORM_C14N_EXCL_WITH_COMMENTS); byte [] canonicalized =
c14n.canonicalize(xmldocument.getBytes()); However, I obtain the following exception: org.xml.sax.SAXParseException:
Invalid byte 2 of 4-byte UTF-8 sequence. at
org.apache.xerces.parsers.DOMParser.parse(Unknown Source) at
org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source) at
org.apache.xml.security.c14n.Canonicalizer.canonicalize(Unknown Source) The xml document is ISO-8859-1 encoded, because I
want to keep special characters (if I encode it in UTF-8, the document turns
into the following: <Document> <one>mel?n</one> <two>1?</two> </Document> ). Could you be so kind as to tell me how to parse an ISO-8859-1
encoded document with xerces, please???? Thank you very much in advance. Inma. |
- Error when parsing ISO-8859-1 encoded documents Inma Marín López
- Re: Error when parsing ISO-8859-1 encoded docume... Stanimir Stamenkov
- Re: Error when parsing ISO-8859-1 encoded do... Michael Glavassevich
