Martinke, Stefan wrote:
Hello,

I'm working on a project where we use an xml-Parser to get information abnout 
specific devices. We are using the Xerces-Parser for parsing our documents. The 
problem is that if we have special character like the german 'ö' or ' ° ' for 
°C in our xml document the transcode-function will not transcode these strings 
to a char*.

Our XML-File looks like this:

<Device Name = "test"
         ID = "1">
        <Devicedetails Name= "Temperatur °C"
                          Temp = "20"/>
</Device>
...

Parsing of the file seems to work but if i try to get the value of the 
attributes the parser crashes at the attribute with the special character.
I'm parsing the file like this:
(After loading and parsing the file i'm taking the rootnode)
char* nodename = XMLString::transcode(rootnode->getNodeName);

Please search the archives of the mailing lists. You will find many responses to this question. In short, you cannot rely on XMLString::transcode(), which transcodes to the local code page, unless you know all the characters you will process can be represented in the local code page. In this particular case, they cannot.

if(strcmp(nodename,"Device")==0)
{
  char* devicename;
  DOMNamedNodeMap * map = rootnode->getAttributes();
  XMLCh* temp1 = XMLString::transcode("Name");
  DOMNode *tmp1 = map->getNamedItem(temp1);
  if(tmp1 !=NULL)
  {
    devicename = XMLString::transcode(tmp1->getNodeValue);
  }
  XMLString::release(temp1);
}
XMLString::release(nodename);

Transcoding constant strings like you're doing is extremely inefficient. You should either transcode all of the strings you will use _to_ UTF-16 before you start parsing, or define constant UTF-16 strings at compile time. There are examples of this in the file src/xercesc/util/XMLUni.cpp

...

I already tried to do several things to solve my problem. I've set the codepage 
to the german one by using
setlocale(LC_CTYPE, "de_DE.UTF-8");
in my main but it didn't help.

You have no guarantee that changing the C locale will change how local code page transcoding works. It's far better to keep data in your application in Unicode at all times.

Dave

Reply via email to