S. Gross wrote:
David Bertoni schrieb:
S. Gross wrote:
Hi there,

i tried to parse a const char* via MeMBufInputSource in the following way:

CString RecieveString= "...." //Using Unicode and
                  //wchar_t as a built-in type

const char* gXMLInMemBuf = XMLString::transcode(RecieveString.GetString());
You should never transcode a Unicode string to the local code page, because you never know if it will support all of the characters in the string. Also, unless you explicitly set the encoding on the InputSource, the parser will assume either UTF-8 or whatever encoding is specified in the XML declaration.


MemBufInputSource* memBufIS = new MemBufInputSource
(
    (const XMLByte*)gXMLInMemBuf
    , static_cast<const XMLSize_t>(strlen(gXMLInMemBuf))
    , "test"
    , false
);

parser->parse(memBufIS); //Error on WinXP

This is working fine on my system (Vista Business 32). I am using VC++9 and Xerces 3.0.1.
You should check to see what your local code page is set to. It's probably UTF-8.


But I am running in troubles on Win XP 32. An Error is thrown with the Message :

error: invalid byte 't' at position 2 of a 4-byte sequence
I'm not sure what the local code page is on this machine, but it is certainly not UTF-8. It seems your document either has an explicit encoding declaration of UTF-8, or it doesn't have one at all, which implies UTF-8.

The resulting DOMDocument* from parsing was thought as input for xsd from CodeSynthesis. So I tried it the other way and put "gXMLInMemBuf" int a stringstream as input for xsd.

Different solution - same problem. I suppose that this is a problem of the encoding i can't figure out (or mayby sth different).

Any suggestions would be really helpfull because this is part of my work for study and i am trying to get along with it as fast as possible.
There's no need to do any transcoding of a UTF-16 string. In fact, the parser operates internally in UTF-16, so it's the most efficient representation:

MemBufInputSource* memBufIS = new MemBufInputSource
(
    RecieveString.GetString(),
    , static_cast<const XMLSize_t>(RecieveString.GetLength())
    , "test"
    , false
);

If you have reason to believe the XML document in the CString instance has an encoding declaration that is not UTF-16, you should explicitly set the encoding for the InputSource:

memBufIS->setEncoding(L"UTF-16LE");

BTW, you've mis-spelled "receive" in your variable name.

Dave


Thanks for this quick response!

I tried your suggestion but that did not work, because of a compiler Error that said, that it is not possible to convert "const wchar_t*" to "const XMLByte*".
Yes, a typo in my code snippet:

    reinterpret_cast<const XMLByte*>(RecieveString.GetString());

I tried it brute force with a cast to "const XMLByte*" but don't I need to multiply the length by 2 because of the UTF-16 encoding.
Sigh...  Yes, it should be:

static_cast<const XMLSize_t>(RecieveString.GetLength() * sizeof(wchar_t))


It worked with memBufIS->setEncoding(L"UTF-8") and my old setup at home.
I have to try that at the Lab at Tuesday. I can assure that my CString contains proper characters in english and nothing else. That should work in Germany to.
I wouldn't bet on it. If you want your application to be robust, you should use the UTF-16 version of the data and force the encoding on the InputSource. It will also avoid the extra CPU cycles and memory allocation to transcode the data.

If you decide to keep the call to XMLString::transcode(), don't forget to call XMLString::release(&gXMLInMemBuf) when you're done with the data.

Dave

Reply via email to