Alberto, thanks for your time.

On 11 Mar 2009, at 15:46, Alberto Massari wrote:
Hi Ben,
1) why do you think that Wrapper4LSInput shouldn't look at the byteStream? The specs list this order

Okay - I see that there is no LSInput.characterStream, which is (sort of) fair enough, so I agree that the order is therefore correct.

2) the stringData is not being converted: MemBufInputSource works on a byte stream, so it needs a cast and a size computed by multiplying sizeof(XMLCh) by the length (in UTF-16 chars) of the string.

Well, here I have to disagree. Look at the (fragment of ) makeStream below:

                        BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to determine how to read data. The LSParser will look at the different inputs specified in the // LSInput in the following order to know which one to read from, the first one that is not null and not an empty string will be used:
                            //   1. LSInput.characterStream
                            //   2. LSInput.byteStream
                            //   3. LSInput.stringData
                            //   4. LSInput.systemId
                            //   5. LSInput.publicId
                            InputSource* 
binStream=fInputSource->getByteStream();
                            if(binStream)
                                return binStream->makeStream();
--->                     const XMLCh* xmlString=fInputSource->getStringData();
// xmlString is a XMLCh*, as created using LSInput->setStringData()

                            if(xmlString)
                            {

--> MemBufInputSource is((const XMLByte*)xmlString, XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false, getMemoryManager());
//So why is it being CAST into XMLByte here?
/And now "is" is being instantiated as if the xmlString is a XMLByte* ....

                               is.setCopyBufToStream(false);
                             return is.makeStream();

//...which makes a  BinInputStream* from  "is"

Now, THAT goes onto instantiate a XMLReader which does an initial load of raw bytes.
    refreshRawBuffer();

and then uses.. and XMLRecognizer to test the Encoding.. HANG ON - this is meant to be XMLCh... ... anyway... That should be FINE if it returns the same encoding as a XMLCh.

So being a XMLCh* - the grammar starts (in terms of bytes)  3c 00

XMLRecognizer::basicEncodingProbe( const XMLByte* const rawBuffer , const XMLSize_t rawByteCount)

Because this doesn't actually know about non BOM UTF-16BE or UTF-16LE (ie, the XMLCh encoding), it is going to return "UTF-8".

Likewise, the grammar string does not have an <?xml ..> declaration, (which is legal) the XMLRecognizer is going to fail.

As you can imagine, once the BinInputStream has been identified as UTF-8, there really is no turning back.

Sure enough, now AbstractDOMParser::startDocument() calls
fDocument->setInputEncoding(fScanner->getReaderMgr()- >getCurrentEncodingStr());

Just in time for
IGXMLScanner::scanDocument(const InputSource& src) to call scanStartTagNS(gotData)

This then hits trouble at (!fReaderMgr.getQName(fQNameBuf, &prefixColonPos)) which return empty
and the empty will emit an Error.


As for the error you see, are you sure your transcoder- >transcoder(grammar_str.c_str()) is actually generating a string of XMLCh? Could you post its code?

My transcoder?

XMLLCPTranscoder* transcoder = XMLPlatformUtils::fgTransService- >makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);



Best regards
        Ben.

Reply via email to