Hello all!

 

 I have an xml document which includes special characters, for example,

 

<Document>

            <one>melón</one>

            <two>1º</two>

</Document>

 

And I want to get it in canonical form, so I do the following (using Apache XML Security and Xerces 2.7.1):

 

            org.apache.xml.security.c14n.Canonicalizer c14n = org.apache.xml.security.c14n.Canonicalizer.getInstance(

org.apache.xml.security.transforms.Transforms.TRANSFORM_C14N_EXCL_WITH_COMMENTS);

            byte [] canonicalized = c14n.canonicalize(xmldocument.getBytes());

 

However, I obtain the following exception:

 

org.xml.sax.SAXParseException: Invalid byte 2 of 4-byte UTF-8 sequence.

            at org.apache.xerces.parsers.DOMParser.parse(Unknown Source)

            at org.apache.xerces.jaxp.DocumentBuilderImpl.parse(Unknown Source)

            at org.apache.xml.security.c14n.Canonicalizer.canonicalize(Unknown Source)

 

 

The xml document is ISO-8859-1 encoded, because I want to keep special characters (if I encode it in UTF-8, the document turns into the following:

 

<Document>

            <one>mel?n</one>

            <two>1?</two>

</Document>

 ).

 

Could you be so kind as to tell me how to parse an ISO-8859-1 encoded document with xerces, please????

Thank you very much in advance.

 

Inma.

 

Reply via email to