Dear mailing list users,
I have a xml file I want to process with xerces. I now have a problem with
german umlaute. I extracted the code I use to show the problem. After reading
the mailing list I switched from XMLString::transcode to a utf8 transcoder, but
it doesn't work either. The problematic platform is Red Hat Enterprise Linux (3
32Bit and 5 64Bit). The XMLString transcode works well on a Solaris 10
platform, the transcoder doesn't work, too. Xerces version is 2.7 linked
statically.
I don't know where the problem with the transcoder is, any help is appreciated.
[[xml example file test.xml]]
<?xml version="1.0" encoding="UTF-8"?><xtest><test>München</test></xtest>
[[code]]
#include <stdio.h>
#include <xercesc/parsers/XercesDOMParser.hpp>
#include <xercesc/dom/DOM.hpp>
#include <xercesc/sax/ErrorHandler.hpp>
#include <xercesc/sax/SAXParseException.hpp>
#include <xercesc/framework/MemBufInputSource.hpp>
#include <xercesc/util/TransService.hpp>
XERCES_CPP_NAMESPACE_USE
int main(int argc, char** argv)
{
XMLPlatformUtils::Initialize();
XercesDOMParser* _parser = new XercesDOMParser();
_parser->setValidationScheme(XercesDOMParser::Val_Auto);
_parser->setIncludeIgnorableWhitespace(false);
_parser->setDoNamespaces(true);
_parser->setDoSchema(true);
XMLCh* xFile = NULL;
xFile = XMLString::transcode("test.xml");
_parser->parse( xFile );
XMLString::release(&xFile);
DOMDocument* _xmlDoc = _parser->getDocument();
DOMNode* rootNode = _xmlDoc->getDocumentElement();
char nodeName[1024] = "";
char nodeValue[1024] = "";
char* tmp = NULL;
DOMNode* childNode = rootNode->getFirstChild();
tmp = XMLString::transcode( childNode->getNodeName() );
strcpy( nodeName, tmp );
printf("name [%s]\n", nodeName );fflush(stdout);
childNode = childNode->getFirstChild();
tmp = XMLString::transcode( childNode->getNodeValue() );
strcpy( nodeValue, tmp );
printf("value xmlstring [%s]\n", nodeValue );fflush(stdout);
XMLTranscoder* utf8Transcoder ;
XMLTransService::Codes failReason;
XMLCh* xmlChars = new XMLCh[ 1024 ];
unsigned int eaten = 0;
unsigned char* charSizes = new unsigned char[1024];
utf8Transcoder =
XMLPlatformUtils::fgTransService->makeNewTranscoderFor("UTF-8", failReason,
16*1024);
utf8Transcoder->transcodeFrom( (XMLByte*)childNode->getNodeValue(),
XMLString::stringLen( childNode->getNodeValue() ), xmlChars, 1024, eaten,
charSizes );
printf("value xmlchars [%s] eaten [%d] charSizes [%s]\n", xmlChars,
eaten, charSizes );
return 0;
}
[[output after calling test application on RHEL 5]]
name [test]
value xmlstring []
value xmlchars [M] eaten [2] charSizes []
[[output after calling test application on Sol 10]]
name [test]
value xmlstring [München]
value xmlchars [] eaten [3] charSizes []
With kind regards,
Mario Freimann
Siemens Aktiengesellschaft: Vorsitzender des Aufsichtsrats: Gerhard Cromme;
Vorstand: Peter Löscher, Vorsitzender; Wolfgang Dehen, Heinrich Hiesinger, Joe
Kaeser, Jim Reid-Anderson, Hermann Requardt, Siegfried Russwurm, Peter Y.
Solmssen; Sitz der Gesellschaft: Berlin und München; Registergericht: Berlin
Charlottenburg, HRB 12300, München, HRB 6684; WEEE-Reg.-Nr. DE 23691322
Wichtiger Hinweis: Diese E-Mail und etwaige Anlagen enthält firmenvertrauliche
Informationen. Sollten Sie diese E-Mail irrtümlich erhalten haben,
benachrichtigen Sie uns bitte durch Antwort-Mail und löschen Sie diese E-Mail
nebst Anlagen von Ihrem System. Vielen Dank.