Error when parsing ISO-8859-1 encoded documents

Hello all!

I have an xml document which includes special characters, for example,

<one>melón</one>

</Document>

And I want to get it in canonical form, so I do the following (using Apache XML Security and Xerces 2.7.1):

org.apache.xml.security.c14n.Canonicalizer c14n = org.apache.xml.security.c14n.Canonicalizer.getInstance(

org.apache.xml.security.transforms.Transforms.TRANSFORM_C14N_EXCL_WITH_COMMENTS);

byte [] canonicalized = c14n.canonicalize(xmldocument.getBytes());

However, I obtain the following exception:

org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.

at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)

at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

at org.apache.xml.security.c14n.Canonicalizer.canonicalize(Unknown Source)

The xml document is ISO-8859-1 encoded, because I want to keep special characters (if I encode it in UTF-8, the document turns into the following:

</Document>

Could you be so kind as to tell me how to parse an ISO-8859-1 encoded document with xerces, please????

Thank you very much in advance.

Inma.

Reply via email to