Re: MemBufInputSource UTF-8 issue

Alberto Massari Fri, 24 Jun 2005 01:26:28 -0700

Hi Matt,

if you are stuffing wchar_t chars, but the string contains the directive<?xml encoding="utf-8"?> you need to force UTF-16 as the actual encoding ofthe buffer, by calling xmlSource.setEncoding("UTF-16LE")


Hope this helps,
Alberto

At 15.51 23/06/2005 -0500, Matt Holmes wrote:

Hello all,
I hope this is the correct place to ask this. I didn't want to ask on thedevelopers list as it doesn't seem like the proper place to ask questionsabout end usage patterns.
In short, I am having an issue using the MemBufInputSource to take a chunkof XML contained in a std::wstring and pass it to DOMBuilder::parse. I amcreating the input source as such:
MemBufInputSource xmlSource(
        reinterpret_cast<const XMLByte *>(xml.to_utf8()),
        static_cast<const unsigned int>(xml.length() * sizeof(wchar_t)),
        "pidc_rules_file",
        true
);
Ignore the 'to_utf8' call, that is just some encoding agnostic extensionswe have added to our std::wstring/std::string sub-class so we can actuallycompile the code ANSI when required. I am telling the input source toadopt the buffer that I pass to it, so that it isn't referencing back tothe std::wstring's internal buffer. Note in this case the code is compiled_UNICODE, so the to_utf8 call is a simple pass-through tostd::wstring::c_str, there is no XMLString::transcode going on here.
When I attempt to call DOMBuilder::parse like so:

m_Doc = m_Builder->parse(Wrapper4InputSource(&xmlSource, false));

I get the following error passed to my error handler:
An exception occurred! Type:UTFDataFormatException, Message:invalid byte 2(±) of a 2-byte sequence.
Obviously it's not liking part of the UTF-8 sequence I am passing it, butwhy? The XML itself is fine. I can tell my builder to load the documentfrom a file using parseURI, and all works well. For functionality reasons,I need to be able to load an arbitrary chunk of XML outside the scope ofloading a file. Is there another, more intuitive way to do this aside fromcreating a MemBufInputSource? Please note, I am doing XML Schemavalidation, so I would like to stick to DOMBuilder if possible.
I am pretty new to Xerces (although not XML or XML parsing in general), soI fear I am simply missing something obvious here.
Any thoughts?

Matt Holmes

Re: MemBufInputSource UTF-8 issue

Reply via email to