Martinke, Stefan wrote:
Hello,
I'm working on a project where we use an xml-Parser to get information abnout
specific devices. We are using the Xerces-Parser for parsing our documents. The
problem is that if we have special character like the german 'ö' or ' ° ' for
°C in our xml document the transcode-function will not transcode these strings
to a char*.
Our XML-File looks like this:
<Device Name = "test"
ID = "1">
<Devicedetails Name= "Temperatur °C"
Temp = "20"/>
</Device>
...
Parsing of the file seems to work but if i try to get the value of the
attributes the parser crashes at the attribute with the special character.
I'm parsing the file like this:
(After loading and parsing the file i'm taking the rootnode)
char* nodename = XMLString::transcode(rootnode->getNodeName);
Please search the archives of the mailing lists. You will find many
responses to this question. In short, you cannot rely on
XMLString::transcode(), which transcodes to the local code page, unless
you know all the characters you will process can be represented in the
local code page. In this particular case, they cannot.
if(strcmp(nodename,"Device")==0)
{
char* devicename;
DOMNamedNodeMap * map = rootnode->getAttributes();
XMLCh* temp1 = XMLString::transcode("Name");
DOMNode *tmp1 = map->getNamedItem(temp1);
if(tmp1 !=NULL)
{
devicename = XMLString::transcode(tmp1->getNodeValue);
}
XMLString::release(temp1);
}
XMLString::release(nodename);
Transcoding constant strings like you're doing is extremely inefficient.
You should either transcode all of the strings you will use _to_
UTF-16 before you start parsing, or define constant UTF-16 strings at
compile time. There are examples of this in the file
src/xercesc/util/XMLUni.cpp
...
I already tried to do several things to solve my problem. I've set the codepage
to the german one by using
setlocale(LC_CTYPE, "de_DE.UTF-8");
in my main but it didn't help.
You have no guarantee that changing the C locale will change how local
code page transcoding works. It's far better to keep data in your
application in Unicode at all times.
Dave