Hi Ben,
the cast in the MemBufInputSource is fine, as it is simply a wrapper for
a bunch of bytes, regardless of which encoding they are using. The only
thing that can be made to avoid your case (a missing XML header in the
string) is adding the call to
is.setEncoding(XMLUni::fgXMLChEncodingString);
after the creation of the object.
Alberto
Ben Griffin wrote:
Alberto, thanks for your time.
On 11 Mar 2009, at 15:46, Alberto Massari wrote:
Hi Ben,
1) why do you think that Wrapper4LSInput shouldn't look at the
byteStream? The specs list this order
Okay - I see that there is no LSInput.characterStream, which is (sort
of) fair enough, so I agree that the order is therefore correct.
2) the stringData is not being converted: MemBufInputSource works on
a byte stream, so it needs a cast and a size computed by multiplying
sizeof(XMLCh) by the length (in UTF-16 chars) of the string.
Well, here I have to disagree. Look at the (fragment of ) makeStream
below:
BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to
determine how to read data. The LSParser will look at the different
inputs specified in the
// LSInput in the following order to know which one to
read from, the first one that is not null and not an empty string will
be used:
// 1. LSInput.characterStream
// 2. LSInput.byteStream
// 3. LSInput.stringData
// 4. LSInput.systemId
// 5. LSInput.publicId
InputSource* binStream=fInputSource->getByteStream();
if(binStream)
return binStream->makeStream();
---> const XMLCh* xmlString=fInputSource->getStringData();
// xmlString is a XMLCh*, as created using LSInput->setStringData()
if(xmlString)
{
--> MemBufInputSource is((const XMLByte*)xmlString,
XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false,
getMemoryManager());
//So why is it being CAST into XMLByte here?
/And now "is" is being instantiated as if the xmlString is a XMLByte*
....
is.setCopyBufToStream(false);
return is.makeStream();
//...which makes a BinInputStream* from "is"
Now, THAT goes onto instantiate a XMLReader which does an initial load
of raw bytes.
refreshRawBuffer();
and then uses.. and XMLRecognizer to test the Encoding.. HANG ON -
this is meant to be XMLCh...
... anyway... That should be FINE if it returns the same encoding as a
XMLCh.
So being a XMLCh* - the grammar starts (in terms of bytes) 3c 00
XMLRecognizer::basicEncodingProbe( const XMLByte* const rawBuffer
, const XMLSize_t rawByteCount)
Because this doesn't actually know about non BOM UTF-16BE or UTF-16LE
(ie, the XMLCh encoding), it is going to return "UTF-8".
Likewise, the grammar string does not have an <?xml ..> declaration,
(which is legal) the XMLRecognizer is going to fail.
As you can imagine, once the BinInputStream has been identified as
UTF-8, there really is no turning back.
Sure enough, now AbstractDOMParser::startDocument() calls
fDocument->setInputEncoding(fScanner->getReaderMgr()->getCurrentEncodingStr());
Just in time for
IGXMLScanner::scanDocument(const InputSource& src) to call
scanStartTagNS(gotData)
This then hits trouble at (!fReaderMgr.getQName(fQNameBuf,
&prefixColonPos)) which return empty
and the empty will emit an Error.
As for the error you see, are you sure your
transcoder->transcoder(grammar_str.c_str()) is actually generating a
string of XMLCh? Could you post its code?
My transcoder?
XMLLCPTranscoder* transcoder =
XMLPlatformUtils::fgTransService->makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);
Best regards
Ben.