Alberto, thanks for your time.
On 11 Mar 2009, at 15:46, Alberto Massari wrote:
Hi Ben,
1) why do you think that Wrapper4LSInput shouldn't look at the
byteStream? The specs list this order
Okay - I see that there is no LSInput.characterStream, which is (sort
of) fair enough, so I agree that the order is therefore correct.
2) the stringData is not being converted: MemBufInputSource works on
a byte stream, so it needs a cast and a size computed by multiplying
sizeof(XMLCh) by the length (in UTF-16 chars) of the string.
Well, here I have to disagree. Look at the (fragment of ) makeStream
below:
BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to determine how to
read data. The LSParser will look at the different inputs specified in
the
// LSInput in the following order to know which one to read
from, the first one that is not null and not an empty string will be
used:
// 1. LSInput.characterStream
// 2. LSInput.byteStream
// 3. LSInput.stringData
// 4. LSInput.systemId
// 5. LSInput.publicId
InputSource*
binStream=fInputSource->getByteStream();
if(binStream)
return binStream->makeStream();
---> const XMLCh* xmlString=fInputSource->getStringData();
// xmlString is a XMLCh*, as created using LSInput->setStringData()
if(xmlString)
{
--> MemBufInputSource is((const XMLByte*)xmlString,
XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false,
getMemoryManager());
//So why is it being CAST into XMLByte here?
/And now "is" is being instantiated as if the xmlString is a
XMLByte* ....
is.setCopyBufToStream(false);
return is.makeStream();
//...which makes a BinInputStream* from "is"
Now, THAT goes onto instantiate a XMLReader which does an initial load
of raw bytes.
refreshRawBuffer();
and then uses.. and XMLRecognizer to test the Encoding.. HANG ON -
this is meant to be XMLCh...
... anyway... That should be FINE if it returns the same encoding as a
XMLCh.
So being a XMLCh* - the grammar starts (in terms of bytes) 3c 00
XMLRecognizer::basicEncodingProbe( const XMLByte* const
rawBuffer , const XMLSize_t rawByteCount)
Because this doesn't actually know about non BOM UTF-16BE or UTF-16LE
(ie, the XMLCh encoding), it is going to return "UTF-8".
Likewise, the grammar string does not have an <?xml ..> declaration,
(which is legal) the XMLRecognizer is going to fail.
As you can imagine, once the BinInputStream has been identified as
UTF-8, there really is no turning back.
Sure enough, now AbstractDOMParser::startDocument() calls
fDocument->setInputEncoding(fScanner->getReaderMgr()-
>getCurrentEncodingStr());
Just in time for
IGXMLScanner::scanDocument(const InputSource& src) to call
scanStartTagNS(gotData)
This then hits trouble at (!fReaderMgr.getQName(fQNameBuf,
&prefixColonPos)) which return empty
and the empty will emit an Error.
As for the error you see, are you sure your transcoder-
>transcoder(grammar_str.c_str()) is actually generating a string of
XMLCh? Could you post its code?
My transcoder?
XMLLCPTranscoder* transcoder = XMLPlatformUtils::fgTransService-
>makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);
Best regards
Ben.