Hi Ben,
the cast in the MemBufInputSource is fine, as it is simply a wrapper for a bunch of bytes, regardless of which encoding they are using. The only thing that can be made to avoid your case (a missing XML header in the string) is adding the call to

is.setEncoding(XMLUni::fgXMLChEncodingString);

after the creation of the object.

Alberto

Ben Griffin wrote:
Alberto, thanks for your time.

On 11 Mar 2009, at 15:46, Alberto Massari wrote:
Hi Ben,
1) why do you think that Wrapper4LSInput shouldn't look at the byteStream? The specs list this order

Okay - I see that there is no LSInput.characterStream, which is (sort of) fair enough, so I agree that the order is therefore correct.

2) the stringData is not being converted: MemBufInputSource works on a byte stream, so it needs a cast and a size computed by multiplying sizeof(XMLCh) by the length (in UTF-16 chars) of the string.

Well, here I have to disagree. Look at the (fragment of ) makeStream below:

            BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to determine how to read data. The LSParser will look at the different inputs specified in the // LSInput in the following order to know which one to read from, the first one that is not null and not an empty string will be used:
                //   1. LSInput.characterStream
                //   2. LSInput.byteStream
                //   3. LSInput.stringData
                //   4. LSInput.systemId
                //   5. LSInput.publicId
                InputSource* binStream=fInputSource->getByteStream();
                if(binStream)
                    return binStream->makeStream();
--->                const XMLCh* xmlString=fInputSource->getStringData();
// xmlString is a XMLCh*, as created using LSInput->setStringData()

                if(xmlString)
                {

--> MemBufInputSource is((const XMLByte*)xmlString, XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false, getMemoryManager());
//So why is it being CAST into XMLByte here?
/And now "is" is being instantiated as if the xmlString is a XMLByte* ....

                   is.setCopyBufToStream(false);
                     return is.makeStream();

//...which makes a  BinInputStream* from  "is"

Now, THAT goes onto instantiate a XMLReader which does an initial load of raw bytes.
    refreshRawBuffer();

and then uses.. and XMLRecognizer to test the Encoding.. HANG ON - this is meant to be XMLCh... ... anyway... That should be FINE if it returns the same encoding as a XMLCh.

So being a XMLCh* - the grammar starts (in terms of bytes)  3c 00

XMLRecognizer::basicEncodingProbe( const XMLByte* const rawBuffer , const XMLSize_t rawByteCount)

Because this doesn't actually know about non BOM UTF-16BE or UTF-16LE (ie, the XMLCh encoding), it is going to return "UTF-8".

Likewise, the grammar string does not have an <?xml ..> declaration, (which is legal) the XMLRecognizer is going to fail.

As you can imagine, once the BinInputStream has been identified as UTF-8, there really is no turning back.

Sure enough, now AbstractDOMParser::startDocument() calls
fDocument->setInputEncoding(fScanner->getReaderMgr()->getCurrentEncodingStr());

Just in time for
IGXMLScanner::scanDocument(const InputSource& src) to call scanStartTagNS(gotData)

This then hits trouble at (!fReaderMgr.getQName(fQNameBuf, &prefixColonPos)) which return empty
and the empty will emit an Error.


As for the error you see, are you sure your transcoder->transcoder(grammar_str.c_str()) is actually generating a string of XMLCh? Could you post its code?

My transcoder?

XMLLCPTranscoder* transcoder = XMLPlatformUtils::fgTransService->makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);



Best regards
    Ben.


Reply via email to