Xerces is poping up exception while parsing a Unicode file, but same is working
fine for an ANSI file
-----------------------------------------------------------------------------------------------------
Key: XERCESC-1955
URL: https://issues.apache.org/jira/browse/XERCESC-1955
Project: Xerces-C++
Issue Type: Bug
Components: DOM
Affects Versions: 3.1.0
Environment: Windows XP 32Bit
Windows7 64bit
Reporter: Jojo Jose
Priority: Blocker
Fix For: 3.1.0
Hi All,
Please let me know, if anybody can provide some clue on this.
I have been using Xerces as XML parser in my C++ application and I have
recently migrated my Xerces version from 1.3 (very old) to 3.1.
After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource
& source={...}) and passing a Unicode file as input, it pops up exception.
However the same works ok for ANSI.
The call stack is as shown below.
xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog() Line 1227 + 0x25 bytes
xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const
xercesc_3_1::InputSource & src={...}) Line 210
xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const
xercesc_3_1::InputSource & source={...}) Line 549
EPConfigTool.dll!XCfgXMLParser::parse() Line 66 - <b>My application code</b>
In the code, it is reaching at
else
{
emitError(XMLErrs::InvalidDocumentStructure);
...
}
The function at parse fail is as shown below:
void XMLScanner::scanProlog()
{
bool sawDocTypeDecl = false;
// Get a buffer for whitespace processing
XMLBufBid bbCData(&fBufMgr);
// Loop through the prolog. If there is no content, this could go all
// the way to the end of the file.
try
{
while (true)
{
const XMLCh nextCh = fReaderMgr.peekNextChar();
if (nextCh == chOpenAngle)
{
// Ok, it could be the xml decl, a comment, the doc type line,
// or the start of the root element.
if (checkXMLDecl(true))
{
// There shall be at lease --ONE-- space in between
// the tag '<?xml' and the VersionInfo.
//
// If we are not at line 1, col 6, then the decl was not
// the first text, so its invalid.
const XMLReader* curReader = fReaderMgr.getCurrentReader();
if ((curReader->getLineNumber() != 1)
|| (curReader->getColumnNumber() != 7))
{
emitError(XMLErrs::XMLDeclMustBeFirst);
}
scanXMLDecl(Decl_XML);
}
else if (fReaderMgr.skippedString(XMLUni::fgPIString))
{
scanPI();
}
else if (fReaderMgr.skippedString(XMLUni::fgCommentString))
{
scanComment();
}
else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString))
{
if (sawDocTypeDecl) {
emitError(XMLErrs::DuplicateDocTypeDecl);
}
scanDocTypeDecl();
sawDocTypeDecl = true;
// if reusing grammar, this has been validated already in
first scan
// skip for performance
if (fValidate && fGrammar && !fGrammar->getValidated()) {
// validate the DTD scan so far
fValidator->preContentValidation(fUseCachedGrammar,
true);
}
}
else
{
// Assume its the start of the root element
return;
}
}
else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh))
{
// If we have a document handler then gather up the
// whitespace and call back. Otherwise just skip over spaces.
if (fDocHandler)
{
fReaderMgr.getSpaces(bbCData.getBuffer());
fDocHandler->ignorableWhitespace
(
bbCData.getRawBuffer()
, bbCData.getLen()
, false
);
}
else
{
fReaderMgr.skipPastSpaces();
}
}
else
{
emitError(XMLErrs::InvalidDocumentStructure);
// Watch for end of file and break out
if (!nextCh)
break;
else
fReaderMgr.skipPastChar(chCloseAngle);
}
}
}
catch(const EndOfEntityException&)
{
// We should never get an end of entity here. They should only
// occur within the doc type scanning method, and not leak out to
// here.
emitError
(
XMLErrs::UnexpectedEOE
, "in prolog"
);
}
}
It is working fine when I move back to version 1.3, but due to various other
requirements, I have to use the new version 3.1 in my application.
Thanks in advance,
Jojo
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]