Hi,

probably this has been asked already but I did not find a reference.

According to the Xerces documentation the internal representation of strings is 
UTF-16. When I tried Xerces on OS X where XMLCh is defined as a uint_16 and try 
to convert from the current locale („de-DE“) to an XMLCh string representation 
by:

        char const* inputSource = "\U0001F600";
        XMLCh* outputCharacter = XMLString::transcode(inputSource);

This obviously does not work. The output is one character with the value 62976.

Looking at the XMLString::transcode method (based on 
IconvLCPTranscoder::transcode)it is obvious that it does not work because on OS 
X:
 - the size of wchar_t is 32 bit;
 - XMLString::transcode copies the output of mbsrtowcs (wide characters of 32 
bit into a 16 bit XMLCh buffer).

    while(true)
    {
        size_t len = ::mbsrtowcs(tmpString + dstCursor, &src, resultSize - 
dstCursor, &st); // len is based on 32 bit
        if (len == TRANSCODING_ERROR)
        {
            dstCursor = 0;
            break;
        }
        dstCursor += len;
        if (src == 0) // conversion finished
            break;
        if (dstCursor >= resultSize - 1)
            reallocString<wchar_t>(tmpString, resultSize, manager, tmpString != 
localBuffer);
    }
    // make a final copy, converting from wchar_t to XMLCh:
    XMLCh* resultString = (XMLCh*)manager->allocate((dstCursor + 1) * 
sizeof(XMLCh)); // result string is based on 16 bit
    size_t i;
    for (i=0; i<dstCursor; ++i)
        resultString[i] = tmpString[i]; // 32 bit number is stored in 16 bit 
number


Therefore, I have two questions:

1) is „configure“ wrongly configured to use for XMLCh a uint_16?
2) how will the result be converted in any case  to a UTF-16 encoding?

Best regards,
Hartwig

Reply via email to