S. Gross wrote:
David Bertoni schrieb:
S. Gross wrote:
Hi there,
i tried to parse a const char* via MeMBufInputSource in the following
way:
CString RecieveString= "...." //Using Unicode and
//wchar_t as a built-in type
const char* gXMLInMemBuf =
XMLString::transcode(RecieveString.GetString());
You should never transcode a Unicode string to the local code page,
because you never know if it will support all of the characters in the
string. Also, unless you explicitly set the encoding on the
InputSource, the parser will assume either UTF-8 or whatever encoding
is specified in the XML declaration.
MemBufInputSource* memBufIS = new MemBufInputSource
(
(const XMLByte*)gXMLInMemBuf
, static_cast<const XMLSize_t>(strlen(gXMLInMemBuf))
, "test"
, false
);
parser->parse(memBufIS); //Error on WinXP
This is working fine on my system (Vista Business 32). I am using
VC++9 and Xerces 3.0.1.
You should check to see what your local code page is set to. It's
probably UTF-8.
But I am running in troubles on Win XP 32. An Error is thrown with
the Message :
error: invalid byte 't' at position 2 of a 4-byte sequence
I'm not sure what the local code page is on this machine, but it is
certainly not UTF-8. It seems your document either has an explicit
encoding declaration of UTF-8, or it doesn't have one at all, which
implies UTF-8.
The resulting DOMDocument* from parsing was thought as input for xsd
from CodeSynthesis. So I tried it the other way and put
"gXMLInMemBuf" int a stringstream as input for xsd.
Different solution - same problem. I suppose that this is a problem
of the encoding i can't figure out (or mayby sth different).
Any suggestions would be really helpfull because this is part of my
work for study and i am trying to get along with it as fast as possible.
There's no need to do any transcoding of a UTF-16 string. In fact,
the parser operates internally in UTF-16, so it's the most efficient
representation:
MemBufInputSource* memBufIS = new MemBufInputSource
(
RecieveString.GetString(),
, static_cast<const XMLSize_t>(RecieveString.GetLength())
, "test"
, false
);
If you have reason to believe the XML document in the CString instance
has an encoding declaration that is not UTF-16, you should explicitly
set the encoding for the InputSource:
memBufIS->setEncoding(L"UTF-16LE");
BTW, you've mis-spelled "receive" in your variable name.
Dave
Thanks for this quick response!
I tried your suggestion but that did not work, because of a compiler
Error that said, that it is not possible to convert "const wchar_t*" to
"const XMLByte*".
Yes, a typo in my code snippet:
reinterpret_cast<const XMLByte*>(RecieveString.GetString());
I tried it brute force with a cast to "const XMLByte*" but don't I need
to multiply the length by 2 because of the UTF-16 encoding.
Sigh... Yes, it should be:
static_cast<const XMLSize_t>(RecieveString.GetLength() * sizeof(wchar_t))
It worked with memBufIS->setEncoding(L"UTF-8") and my old setup at home.
I have to try that at the Lab at Tuesday. I can assure that my CString
contains proper characters in english and nothing else. That should work
in Germany to.
I wouldn't bet on it. If you want your application to be robust, you
should use the UTF-16 version of the data and force the encoding on the
InputSource. It will also avoid the extra CPU cycles and memory
allocation to transcode the data.
If you decide to keep the call to XMLString::transcode(), don't forget
to call XMLString::release(&gXMLInMemBuf) when you're done with the data.
Dave