Hi Matt,
if you are stuffing wchar_t chars, but the string contains the directive
<?xml encoding="utf-8"?> you need to force UTF-16 as the actual encoding of
the buffer, by calling xmlSource.setEncoding("UTF-16LE")
Hope this helps,
Alberto
At 15.51 23/06/2005 -0500, Matt Holmes wrote:
Hello all,
I hope this is the correct place to ask this. I didn't want to ask on the
developers list as it doesn't seem like the proper place to ask questions
about end usage patterns.
In short, I am having an issue using the MemBufInputSource to take a chunk
of XML contained in a std::wstring and pass it to DOMBuilder::parse. I am
creating the input source as such:
MemBufInputSource xmlSource(
reinterpret_cast<const XMLByte *>(xml.to_utf8()),
static_cast<const unsigned int>(xml.length() * sizeof(wchar_t)),
"pidc_rules_file",
true
);
Ignore the 'to_utf8' call, that is just some encoding agnostic extensions
we have added to our std::wstring/std::string sub-class so we can actually
compile the code ANSI when required. I am telling the input source to
adopt the buffer that I pass to it, so that it isn't referencing back to
the std::wstring's internal buffer. Note in this case the code is compiled
_UNICODE, so the to_utf8 call is a simple pass-through to
std::wstring::c_str, there is no XMLString::transcode going on here.
When I attempt to call DOMBuilder::parse like so:
m_Doc = m_Builder->parse(Wrapper4InputSource(&xmlSource, false));
I get the following error passed to my error handler:
An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2
(±) of a 2-byte sequence.
Obviously it's not liking part of the UTF-8 sequence I am passing it, but
why? The XML itself is fine. I can tell my builder to load the document
from a file using parseURI, and all works well. For functionality reasons,
I need to be able to load an arbitrary chunk of XML outside the scope of
loading a file. Is there another, more intuitive way to do this aside from
creating a MemBufInputSource? Please note, I am doing XML Schema
validation, so I would like to stick to DOMBuilder if possible.
I am pretty new to Xerces (although not XML or XML parsing in general), so
I fear I am simply missing something obvious here.
Any thoughts?
Matt Holmes