Hi Ben,
1) why do you think that Wrapper4LSInput shouldn't look at the
byteStream? The specs list this order
1. |LSInput.characterStream|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-characterStream>
2. |LSInput.byteStream|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-byteStream>
3. |LSInput.stringData|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-stringData>
4. |LSInput.systemId|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-systemId>
5. |LSInput.publicId|
<http://www.w3.org/TR/DOM-Level-3-LS/load-save.html#LS-LSInput-publicId>
and the first item, characterStream (of type LSReader) is not available
in Xerces-C++, as allowed by the specs (LSReader is an Object, so its
purpose is to allow the use of java.lang.String).
2) the stringData is not being converted: MemBufInputSource works on a
byte stream, so it needs a cast and a size computed by multiplying
sizeof(XMLCh) by the length (in UTF-16 chars) of the string.
As for the error you see, are you sure your
transcoder->transcoder(grammar_str.c_str()) is actually generating a
string of XMLCh? Could you post its code?
Alberto
Ben Griffin wrote:
Okay - I've been staring at this for four days now.
Here is a small example of what is bugging me:
-----------------
class Err: public DOMErrorHandler {
bool Err::handleError(const xercesc::DOMError& domError) {
std::cerr << transcode(domError.getMessage());
return true;
}
};
int main(int argc, char *argv[]) {
XMLPlatformUtils::Initialize();
transcoder =
XMLPlatformUtils::fgTransService->makeNewLCPTranscoder(XMLPlatformUtils::fgMemoryManager);
std::string grammar_str = "<xs:schema
targetNamespace=\"http://my.org/blah\"
xmlns:xs=\"http://www.w3.org/2001/XMLSchema\" ><xs:attribute
name=\"box\" fixed=\"true\" /></xs:schema>";
XMLCh* grammar_file = transcoder->transcode(grammar_str.c_str());
Grammar::GrammarType grammar_type = Grammar::SchemaGrammarType;
DOMImplementation* impl =
DOMImplementationRegistry::getDOMImplementation(X("LS"));
DOMLSParser* parser =
((DOMImplementationLS*)impl)->createLSParser(DOMImplementationLS::MODE_SYNCHRONOUS,
0);
DOMConfiguration* dc = parser->getDomConfig();
Err* errorHandler = new Err();
dc->setParameter(XMLUni::fgDOMErrorHandler,errorHandler);
dc->setParameter(XMLUni::fgXercesUseCachedGrammarInParse,
true);
dc->setParameter(XMLUni::fgXercesSchema, true);
dc->setParameter(XMLUni::fgXercesCacheGrammarFromParse,
true);
dc->setParameter(XMLUni::fgDOMValidate, true);
DOMLSInput* input =
((DOMImplementationLS*)impl)->createLSInput();
input->setStringData(grammar_file);
parser->loadGrammar(input, grammar_type, true);
// [...]
}
-----------------------------------------------
An error is being thrown by IGXMLScanner::scanStartTagNS because
fQNameBuf is not being loaded by ReaderMgr.getQName because
isFirstNCNameChar is returning false.
if (!fReaderMgr.getQName(fQNameBuf, &prefixColonPos)) {
if (fQNameBuf.isEmpty())
emitError(XMLErrs::ExpectedElementName); // <-- Error
thrown here.
else
//false being returned by XMLReader::isFirstNCNameChar.
inline bool XMLReader::isFirstNCNameChar(const XMLCh toCheck) const {
return (((fgCharCharsTable[toCheck] & gFirstNameCharMask) != 0)
&& (toCheck != chColon));
}
The reason is that the schema characters in fCharBuf have been
converted twice. (note that this is little-endian)
(what follows is the start of a memory dump of the fCharBuf )
3c 00 00 00 78 00 00 00 73 00 00 00 3a 00 00 00
73 00 00 00 63 00 00 00 68 00 00 00 65 00 00 00
6d 00 00 00 61 00 00 00 20 00 00 00 74 00 00 00
61 00 00 00 72 00 00 00 67 00 00 00 65 00 00 00
74 00 00 00 4e 00 00 00 61 00 00 00 6d 00 00 00
65 00 00 00 73 00 00 00 70 00 00 00 61 00 00 00
#0 0x00fe3453 in xercesc_3_0::Wrapper4DOMLSInput::makeStream at
Wrapper4DOMLSInput.cpp:132
#1 0x01011e7b in xercesc_3_0::ReaderMgr::createReader at
ReaderMgr.cpp:365
#2 0x0100d6f7 in xercesc_3_0::IGXMLScanner::scanReset at
IGXMLScanner2.cpp:1362
#3 0x01003c1b in xercesc_3_0::IGXMLScanner::scanDocument at
IGXMLScanner.cpp:197
#4 0x0105b587 in xercesc_3_0::AbstractDOMParser::parse at
AbstractDOMParser.cpp:535
#5 0x01008845 in xercesc_3_0::IGXMLScanner::loadXMLSchemaGrammar at
IGXMLScanner2.cpp:2085
#6 0x00ffee5f in xercesc_3_0::IGXMLScanner::loadGrammar at
IGXMLScanner.cpp:3005
#7 0x010616c9 in xercesc_3_0::DOMLSParserImpl::loadGrammar at
DOMLSParserImpl.cpp:935
//So here we see the culprit -
BinInputStream* Wrapper4DOMLSInput::makeStream() const {
// The LSParser will use the LSInput object to determine how to
read data. The LSParser will look at the different inputs specified in
the
// LSInput in the following order to know which one to read from,
the first one that is not null and not an empty string will be used:
// 1. LSInput.characterStream
// 2. LSInput.byteStream
// 3. LSInput.stringData
// 4. LSInput.systemId
// 5. LSInput.publicId
InputSource* binStream=fInputSource->getByteStream();
if(binStream)
return binStream->makeStream();
const XMLCh* xmlString=fInputSource->getStringData();
if(xmlString)
{
MemBufInputSource is((const XMLByte*)xmlString,
XMLString::stringLen(xmlString)*sizeof(XMLCh), "", false,
getMemoryManager()); // <--!!!! what?!
is.setCopyBufToStream(false);
return is.makeStream();
}
-----------------------------------------------
First of all the fact that this function first looks at the byteStream
MUST be a bug.
Secondly, the characterStream is being CONVERTED - when it should
already be an XMLCh* (as defined everywhere else)
Or am I missing a trick?