Hi Matt,
if you are stuffing wchar_t chars, but the string contains the directive <?xml encoding="utf-8"?> you need to force UTF-16 as the actual encoding of the buffer, by calling xmlSource.setEncoding("UTF-16LE")

Hope this helps,
Alberto

At 15.51 23/06/2005 -0500, Matt Holmes wrote:
Hello all,

I hope this is the correct place to ask this. I didn't want to ask on the developers list as it doesn't seem like the proper place to ask questions about end usage patterns.

In short, I am having an issue using the MemBufInputSource to take a chunk of XML contained in a std::wstring and pass it to DOMBuilder::parse. I am creating the input source as such:

MemBufInputSource xmlSource(
        reinterpret_cast<const XMLByte *>(xml.to_utf8()),
        static_cast<const unsigned int>(xml.length() * sizeof(wchar_t)),
        "pidc_rules_file",
        true
);

Ignore the 'to_utf8' call, that is just some encoding agnostic extensions we have added to our std::wstring/std::string sub-class so we can actually compile the code ANSI when required. I am telling the input source to adopt the buffer that I pass to it, so that it isn't referencing back to the std::wstring's internal buffer. Note in this case the code is compiled _UNICODE, so the to_utf8 call is a simple pass-through to std::wstring::c_str, there is no XMLString::transcode going on here.

When I attempt to call DOMBuilder::parse like so:

m_Doc = m_Builder->parse(Wrapper4InputSource(&xmlSource, false));

I get the following error passed to my error handler:

An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2 (±) of a 2-byte sequence.

Obviously it's not liking part of the UTF-8 sequence I am passing it, but why? The XML itself is fine. I can tell my builder to load the document from a file using parseURI, and all works well. For functionality reasons, I need to be able to load an arbitrary chunk of XML outside the scope of loading a file. Is there another, more intuitive way to do this aside from creating a MemBufInputSource? Please note, I am doing XML Schema validation, so I would like to stick to DOMBuilder if possible.

I am pretty new to Xerces (although not XML or XML parsing in general), so I fear I am simply missing something obvious here.

Any thoughts?

Matt Holmes


Reply via email to