S. Gross wrote:
Hi there,

i tried to parse a const char* via MeMBufInputSource in the following way:

CString RecieveString= "...." //Using Unicode and
                  //wchar_t as a built-in type

const char* gXMLInMemBuf = XMLString::transcode(RecieveString.GetString());
You should never transcode a Unicode string to the local code page, because you never know if it will support all of the characters in the string. Also, unless you explicitly set the encoding on the InputSource, the parser will assume either UTF-8 or whatever encoding is specified in the XML declaration.


MemBufInputSource* memBufIS = new MemBufInputSource
(
    (const XMLByte*)gXMLInMemBuf
    , static_cast<const XMLSize_t>(strlen(gXMLInMemBuf))
    , "test"
    , false
);

parser->parse(memBufIS); //Error on WinXP

This is working fine on my system (Vista Business 32). I am using VC++9 and Xerces 3.0.1.
You should check to see what your local code page is set to. It's probably UTF-8.


But I am running in troubles on Win XP 32. An Error is thrown with the Message :

error: invalid byte 't' at position 2 of a 4-byte sequence
I'm not sure what the local code page is on this machine, but it is certainly not UTF-8. It seems your document either has an explicit encoding declaration of UTF-8, or it doesn't have one at all, which implies UTF-8.

The resulting DOMDocument* from parsing was thought as input for xsd from CodeSynthesis. So I tried it the other way and put "gXMLInMemBuf" int a stringstream as input for xsd.

Different solution - same problem. I suppose that this is a problem of the encoding i can't figure out (or mayby sth different).

Any suggestions would be really helpfull because this is part of my work for study and i am trying to get along with it as fast as possible.
There's no need to do any transcoding of a UTF-16 string. In fact, the parser operates internally in UTF-16, so it's the most efficient representation:

MemBufInputSource* memBufIS = new MemBufInputSource
(
    RecieveString.GetString(),
    , static_cast<const XMLSize_t>(RecieveString.GetLength())
    , "test"
    , false
);

If you have reason to believe the XML document in the CString instance has an encoding declaration that is not UTF-16, you should explicitly set the encoding for the InputSource:

memBufIS->setEncoding(L"UTF-16LE");

BTW, you've mis-spelled "receive" in your variable name.

Dave

Reply via email to