pundog wrote:
Thanks for the quick reply,
I searched the archive and came up with this:
const XMLCh* text = MY_HEBREW_TEXT; // initialized by the parser
XMLTranscoder* utf8Transcoder;
XMLTransService::Codes failReason;
utf8Transcoder =
XMLPlatformUtils::fgTransService->makeNewTranscoderFor("UTF-8", failReason,
16*1024);
int len = XMLString::stringLen(text);
XMLByte* utf8 = new XMLByte(); // ?
unsigned int eaten;
unsigned int utf8Len = utf8Transcoder->transcodeTo(text, len, utf8, len,
eacten, XMLTranscoder::UnRep_Throw);
utf8[utf8Len] = '\0';
string str = (char*)utf8;
return str;
It looks like it works, but problem is that i'm getting a serious memory
leak from this code, and i don't really know why.
i tried to delete the XMLByte*, but when i try to do that, i get a nasty
exception..
The transcoder does not allocate a target buffer for transcoding. Please
make sure you read the comments for any functions you try to use:
/** Converts from the encoding of the service to the internal XMLCh* encoding
*
* @param srcData the source buffer to be transcoded
* @param srcCount number of bytes in the source buffer
* @param toFill the destination buffer
* @param maxChars the max number of characters in the destination buffer
Since you allocated a single byte, but probably passed in a larger value,
your code suffers from a buffer overrun error. The exception is probably a
result of your code trashing some heap control information. Or perhaps you
used "delete", instead of "delete []".
Search the code for other uses of this functionality, because it's more
complicated than just making a single call to the transcoder, if you want
reasonable efficiency.
If you want a simple, but potentially inefficient implementation, you can
just assume 4 bytes of UTF-8 for every byte of the input and allocate a
buffer accordingly.
size_t len = XMLString::stringLen(text);
XMLByte* utf8 = new XMLByte((len * 4) + 1); // ?
unsigned int eaten;
unsigned int utf8Len = utf8Transcoder->transcodeTo(text, len, utf8, len * 4,
eaten, XMLTranscoder::UnRep_Throw);
utf8[utf8Len] = '\0';
string str = (char*)utf8;
delete [] utf8;
Dave