Dear mailing list users,

I have a xml file I want to process with xerces. I now have a problem with 
german umlaute. I extracted the code I use to show the problem. After reading 
the mailing list I switched from XMLString::transcode to a utf8 transcoder, but 
it doesn't work either. The problematic platform is Red Hat Enterprise Linux (3 
32Bit and 5 64Bit). The XMLString transcode works well on a Solaris 10 
platform, the transcoder doesn't work, too. Xerces version is 2.7 linked 
statically.


I don't know where the problem with the transcoder is, any help is appreciated. 


[[xml example file test.xml]]

<?xml version="1.0" encoding="UTF-8"?><xtest><test>M&#252;nchen</test></xtest>

[[code]]

#include <stdio.h>

#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/dom/DOM.hpp>
#include <xercesc/sax/ErrorHandler.hpp>
#include <xercesc/sax/SAXParseException.hpp>
#include <xercesc/framework/MemBufInputSource.hpp>

#include <xercesc/util/TransService.hpp>

XERCES_CPP_NAMESPACE_USE

int main(int argc, char** argv)
{
        XMLPlatformUtils::Initialize();

        XercesDOMParser* _parser = new XercesDOMParser();

        _parser->setValidationScheme(XercesDOMParser::Val_Auto);
        _parser->setIncludeIgnorableWhitespace(false);
        _parser->setDoNamespaces(true);
        _parser->setDoSchema(true);

        XMLCh* xFile = NULL;

        xFile = XMLString::transcode("test.xml");
        _parser->parse( xFile );
        XMLString::release(&xFile);

        DOMDocument* _xmlDoc = _parser->getDocument();

        DOMNode* rootNode = _xmlDoc->getDocumentElement();

        char nodeName[1024] = "";
        char nodeValue[1024] = "";
        char* tmp = NULL;

        DOMNode* childNode = rootNode->getFirstChild();
        tmp = XMLString::transcode( childNode->getNodeName() );
        strcpy( nodeName, tmp );
        printf("name [%s]\n", nodeName );fflush(stdout);

        childNode = childNode->getFirstChild();
        tmp = XMLString::transcode( childNode->getNodeValue() );
        strcpy( nodeValue, tmp );
        printf("value xmlstring [%s]\n", nodeValue );fflush(stdout);

        XMLTranscoder* utf8Transcoder ;
        XMLTransService::Codes failReason;

        XMLCh* xmlChars = new XMLCh[ 1024 ];
        unsigned int eaten = 0;
        unsigned char* charSizes = new unsigned char[1024];

        utf8Transcoder = 
XMLPlatformUtils::fgTransService->makeNewTranscoderFor("UTF-8", failReason, 
16*1024);
        utf8Transcoder->transcodeFrom( (XMLByte*)childNode->getNodeValue(), 
XMLString::stringLen( childNode->getNodeValue() ), xmlChars, 1024, eaten, 
charSizes );
        printf("value xmlchars [%s] eaten [%d] charSizes [%s]\n", xmlChars, 
eaten, charSizes );

        return 0;
}

[[output after calling test application on RHEL 5]]

name [test]
value xmlstring []
value xmlchars [M] eaten [2] charSizes []

[[output after calling test application on Sol 10]]

name [test]
value xmlstring [München]
value xmlchars [] eaten [3] charSizes []


With kind regards,
Mario Freimann

Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme; 
Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe 
Kaeser, Jim Reid-Anderson, Hermann Requardt, Siegfried Russwurm, Peter Y. 
Solmssen; Sitz der Gesellschaft: Berlin und München; Registergericht: Berlin 
Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322

Wichtiger Hinweis: Diese E-Mail und etwaige Anlagen enthält firmenvertrauliche 
Informationen. Sollten Sie diese E-Mail irrtümlich erhalten haben, 
benachrichtigen Sie uns bitte durch Antwort-Mail und löschen Sie diese E-Mail 
nebst Anlagen von Ihrem System. Vielen Dank.

Reply via email to