Hi Ben,
1) why do you think that Wrapper4LSInput shouldn't look at the byteStream? The specs list this order

  1. |LSInput.characterStream|
     
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-characterStream>

  2. |LSInput.byteStream|
     <http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-byteStream>

  3. |LSInput.stringData|
     <http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-stringData>

  4. |LSInput.systemId|
     <http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-systemId>

  5. |LSInput.publicId|
     <http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-publicId>


and the first item, characterStream (of type LSReader) is not available in Xerces-C++, as allowed by the specs (LSReader is an Object, so its purpose is to allow the use of java.lang.String).

2) the stringData is not being converted: MemBufInputSource works on a byte stream, so it needs a cast and a size computed by multiplying sizeof(XMLCh) by the length (in UTF-16 chars) of the string.

As for the error you see, are you sure your transcoder->transcoder(grammar_str.c_str()) is actually generating a string of XMLCh? Could you post its code?

Alberto

Ben Griffin wrote:
Okay - I've been staring at this for four days now.
Here is a small example of what is bugging me:
-----------------
    class Err: public DOMErrorHandler {
        bool Err::handleError(const xercesc::DOMError& domError) {
            std::cerr << transcode(domError.getMessage());
            return true;
        }
    };

    int main(int argc, char *argv[]) {
        XMLPlatformUtils::Initialize();
transcoder = XMLPlatformUtils::fgTransService->makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager); std::string grammar_str = "<xs:schema targetNamespace=\"http://my.org/blah\"; xmlns:xs=\"http://www.w3.org/2001/XMLSchema\"; ><xs:attribute name=\"box\" fixed=\"true\" /></xs:schema>";
        XMLCh* grammar_file = transcoder->transcode(grammar_str.c_str());
        Grammar::GrammarType grammar_type = Grammar::SchemaGrammarType;
DOMImplementation* impl = DOMImplementationRegistry::getDOMImplementation(X("LS")); DOMLSParser* parser = ((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS, 0); DOMConfiguration* dc = parser->getDomConfig();
        Err* errorHandler = new Err();
        dc->setParameter(XMLUni::fgDOMErrorHandler,errorHandler);
dc->setParameter(XMLUni::fgXercesUseCachedGrammarInParse, true); dc->setParameter(XMLUni::fgXercesSchema, true); dc->setParameter(XMLUni::fgXercesCacheGrammarFromParse, true); dc->setParameter(XMLUni::fgDOMValidate, true); DOMLSInput* input = ((DOMImplementationLS*)impl)->createLSInput();
        input->setStringData(grammar_file);
        parser->loadGrammar(input, grammar_type, true);
// [...]

    }
-----------------------------------------------
An error is being thrown by IGXMLScanner::scanStartTagNS because fQNameBuf is not being loaded by ReaderMgr.getQName because isFirstNCNameChar is returning false.

    if (!fReaderMgr.getQName(fQNameBuf, &prefixColonPos)) {
        if (fQNameBuf.isEmpty())
emitError(XMLErrs::ExpectedElementName); // <-- Error thrown here.
        else


//false being returned by XMLReader::isFirstNCNameChar.
inline bool XMLReader::isFirstNCNameChar(const XMLCh toCheck) const {
    return (((fgCharCharsTable[toCheck] & gFirstNameCharMask) != 0)
            && (toCheck != chColon));
}

The reason is that the schema characters in fCharBuf have been converted twice. (note that this is little-endian)
(what follows is the start of a memory dump of the fCharBuf )
3c 00 00 00 78 00 00 00 73 00 00 00 3a 00 00 00
73 00 00 00 63 00 00 00 68 00 00 00 65 00 00 00
6d 00 00 00 61 00 00 00 20 00 00 00 74 00 00 00
61 00 00 00 72 00 00 00 67 00 00 00 65 00 00 00
74 00 00 00 4e 00 00 00 61 00 00 00 6d 00 00 00
65 00 00 00 73 00 00 00 70 00 00 00 61 00 00 00

#0 0x00fe3453 in xercesc_3_0::Wrapper4DOMLSInput::makeStream at Wrapper4DOMLSInput.cpp:132 #1 0x01011e7b in xercesc_3_0::ReaderMgr::createReader at ReaderMgr.cpp:365 #2 0x0100d6f7 in xercesc_3_0::IGXMLScanner::scanReset at IGXMLScanner2.cpp:1362 #3 0x01003c1b in xercesc_3_0::IGXMLScanner::scanDocument at IGXMLScanner.cpp:197 #4 0x0105b587 in xercesc_3_0::AbstractDOMParser::parse at AbstractDOMParser.cpp:535 #5 0x01008845 in xercesc_3_0::IGXMLScanner::loadXMLSchemaGrammar at IGXMLScanner2.cpp:2085 #6 0x00ffee5f in xercesc_3_0::IGXMLScanner::loadGrammar at IGXMLScanner.cpp:3005 #7 0x010616c9 in xercesc_3_0::DOMLSParserImpl::loadGrammar at DOMLSParserImpl.cpp:935

//So here we see the culprit -
BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to determine how to read data. The LSParser will look at the different inputs specified in the // LSInput in the following order to know which one to read from, the first one that is not null and not an empty string will be used:
    //   1. LSInput.characterStream
    //   2. LSInput.byteStream
    //   3. LSInput.stringData
    //   4. LSInput.systemId
    //   5. LSInput.publicId

    InputSource* binStream=fInputSource->getByteStream();
    if(binStream)
        return binStream->makeStream();
    const XMLCh* xmlString=fInputSource->getStringData();
    if(xmlString)
    {
MemBufInputSource is((const XMLByte*)xmlString, XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false, getMemoryManager()); // <--!!!! what?!
        is.setCopyBufToStream(false);
        return is.makeStream();
    }
-----------------------------------------------

First of all the fact that this function first looks at the byteStream MUST be a bug. Secondly, the characterStream is being CONVERTED - when it should already be an XMLCh* (as defined everywhere else)


Or am I missing a trick?


Reply via email to