utf-8 encoded attribute values

Robert Parker Tue, 07 Jun 2005 04:53:45 -0700

Hi
 
I am parsing an XML string that is encoded in UTF-8 and I am using the
following code to view element attributes:
 
    DOM_NamedNodeMap NodeMap    = node.getAttributes();
    if ( NodeMap != NULL) {
 
        unsigned int len = NodeMap.getLength();
        for ( int i = 0; i < len; ++i) {
            DOM_Node attr = NodeMap.item(i);
 
            DOMString tag = attr.getNodeName();
            char *t = tag.transcode();
            printf ("    %s=", t );
            delete [] t;
 
            DOMString value     = attr.getNodeValue();
            t = value.transcode();
            printf ("%s\n", t );
            delete [] t;
for ( int i = 0; i < value.length() ; i++ )
{
printf( " AT %d %c %02x\n", i, value.charAt(i), value.charAt(i) );
}
        }
    }
 
Both the transcode'd value and the "raw" value.charAt() shows my parsed
attribute value as latin-1
 
It seems to me that Xerces converts the UTF-8 encoded attribute values
during the parse. 
How can I get Xerces to return the actual UTF-8 encoded data rather than
the latin-1 representation?
 
(I am using Xerces 1.5.2 ! - I know it's old but I'm trying to avoid a
massive upgrade exercise if at all possible)
 
thanks
Robert



______________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.
For more information please visit http://www.messagelabs.com/email 
______________________________________________________________________

utf-8 encoded attribute values

Reply via email to