Pass stuff to the parser as UTF16.
-----Original Message-----
From: Matt Holmes [mailto:[EMAIL PROTECTED]
Sent: Thursday, June 23, 2005 1:51 PM
To: [email protected]
Subject: MemBufInputSource UTF-8 issue
Hello all,
I hope this is the correct place to ask this. I didn't want to ask on the
developers list as it doesn't seem like the proper place to ask questions
about end usage patterns.
In short, I am having an issue using the MemBufInputSource to take a chunk
of XML contained in a std::wstring and pass it to DOMBuilder::parse. I am
creating the input source as such:
MemBufInputSource xmlSource(
reinterpret_cast<const XMLByte *>(xml.to_utf8()),
static_cast<const unsigned int>(xml.length() * sizeof(wchar_t)),
"pidc_rules_file",
true
);
Ignore the 'to_utf8' call, that is just some encoding agnostic extensions we
have added to our std::wstring/std::string sub-class so we can actually
compile the code ANSI when required. I am telling the input source to adopt
the buffer that I pass to it, so that it isn't referencing back to the
std::wstring's internal buffer. Note in this case the code is compiled
_UNICODE, so the to_utf8 call is a simple pass-through to
std::wstring::c_str, there is no XMLString::transcode going on here.
When I attempt to call DOMBuilder::parse like so:
m_Doc = m_Builder->parse(Wrapper4InputSource(&xmlSource, false));
I get the following error passed to my error handler:
An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2
(│) of a 2-byte sequence.
Obviously it's not liking part of the UTF-8 sequence I am passing it, but
why? The XML itself is fine. I can tell my builder to load the document from
a file using parseURI, and all works well. For functionality reasons, I need
to be able to load an arbitrary chunk of XML outside the scope of loading a
file. Is there another, more intuitive way to do this aside from creating a
MemBufInputSource? Please note, I am doing XML Schema validation, so I would
like to stick to DOMBuilder if possible.
I am pretty new to Xerces (although not XML or XML parsing in general), so I
fear I am simply missing something obvious here.
Any thoughts?
Matt Holmes
___________________________________________________________________
The information contained in this message and any attachment may be
proprietary, confidential, and privileged or subject to the work
product doctrine and thus protected from disclosure. If the reader
of this message is not the intended recipient, or an employee or
agent responsible for delivering this message to the intended
recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is strictly prohibited.
If you have received this communication in error, please notify me
immediately by replying to this message and deleting it and all
copies and backups thereof. Thank you.