S. Gross wrote:
Hi there,
i tried to parse a const char* via MeMBufInputSource in the following way:
CString RecieveString= "...." //Using Unicode and
//wchar_t as a built-in type
const char* gXMLInMemBuf = XMLString::transcode(RecieveString.GetString());
You should never transcode a Unicode string to the local code page,
because you never know if it will support all of the characters in the
string. Also, unless you explicitly set the encoding on the InputSource,
the parser will assume either UTF-8 or whatever encoding is specified in
the XML declaration.
MemBufInputSource* memBufIS = new MemBufInputSource
(
(const XMLByte*)gXMLInMemBuf
, static_cast<const XMLSize_t>(strlen(gXMLInMemBuf))
, "test"
, false
);
parser->parse(memBufIS); //Error on WinXP
This is working fine on my system (Vista Business 32). I am using VC++9
and Xerces 3.0.1.
You should check to see what your local code page is set to. It's
probably UTF-8.
But I am running in troubles on Win XP 32. An Error is thrown with the
Message :
error: invalid byte 't' at position 2 of a 4-byte sequence
I'm not sure what the local code page is on this machine, but it is
certainly not UTF-8. It seems your document either has an explicit
encoding declaration of UTF-8, or it doesn't have one at all, which
implies UTF-8.
The resulting DOMDocument* from parsing was thought as input for xsd
from CodeSynthesis. So I tried it the other way and put "gXMLInMemBuf"
int a stringstream as input for xsd.
Different solution - same problem. I suppose that this is a problem of
the encoding i can't figure out (or mayby sth different).
Any suggestions would be really helpfull because this is part of my work
for study and i am trying to get along with it as fast as possible.
There's no need to do any transcoding of a UTF-16 string. In fact, the
parser operates internally in UTF-16, so it's the most efficient
representation:
MemBufInputSource* memBufIS = new MemBufInputSource
(
RecieveString.GetString(),
, static_cast<const XMLSize_t>(RecieveString.GetLength())
, "test"
, false
);
If you have reason to believe the XML document in the CString instance
has an encoding declaration that is not UTF-16, you should explicitly
set the encoding for the InputSource:
memBufIS->setEncoding(L"UTF-16LE");
BTW, you've mis-spelled "receive" in your variable name.
Dave