Xerces is poping up exception while parsing a Unicode file, but same is working 
fine for an ANSI file
-----------------------------------------------------------------------------------------------------

                 Key: XERCESC-1955
                 URL: https://issues.apache.org/jira/browse/XERCESC-1955
             Project: Xerces-C++
          Issue Type: Bug
          Components: DOM
    Affects Versions: 3.1.0
         Environment: Windows XP 32Bit 
Windows7 64bit
            Reporter: Jojo Jose
            Priority: Blocker
             Fix For: 3.1.0


Hi All,

Please let me know, if anybody can provide some clue on this.

I have been using Xerces as XML parser in my C++ application and I have 
recently migrated my Xerces version from 1.3 (very old) to 3.1.

After that, when I call AbstractDOMParser::parse(const xercesc_3_1::InputSource 
& source={...}) and passing a Unicode file as input, it pops up exception. 
However the same works ok for ANSI.

The call stack is as shown below.

xerces-c_3_1.dll!xercesc_3_1::XMLScanner::scanProlog()  Line 1227 + 0x25 bytes
xerces-c_3_1.dll!xercesc_3_1::IGXMLScanner::scanDocument(const 
xercesc_3_1::InputSource & src={...})  Line 210
xerces-c_3_1.dll!xercesc_3_1::AbstractDOMParser::parse(const 
xercesc_3_1::InputSource & source={...})  Line 549
EPConfigTool.dll!XCfgXMLParser::parse()  Line 66 - <b>My application code</b>

In the code, it is reaching at  
else
{
 emitError(XMLErrs::InvalidDocumentStructure);
...
}

The function at parse fail is as shown below:

void XMLScanner::scanProlog()
{
    bool sawDocTypeDecl = false;
    // Get a buffer for whitespace processing
    XMLBufBid bbCData(&fBufMgr);

    //  Loop through the prolog. If there is no content, this could go all
    //  the way to the end of the file.
    try
    {
        while (true)
        {
            const XMLCh nextCh = fReaderMgr.peekNextChar();

            if (nextCh == chOpenAngle)
            {
                //  Ok, it could be the xml decl, a comment, the doc type line,
                //  or the start of the root element.
                if (checkXMLDecl(true))
                {
                    // There shall be at lease --ONE-- space in between
                    // the tag '<?xml' and the VersionInfo.
                    //
                    //  If we are not at line 1, col 6, then the decl was not
                    //  the first text, so its invalid.
                    const XMLReader* curReader = fReaderMgr.getCurrentReader();
                    if ((curReader->getLineNumber() != 1)
                    ||  (curReader->getColumnNumber() != 7))
                    {
                        emitError(XMLErrs::XMLDeclMustBeFirst);
                    }

                    scanXMLDecl(Decl_XML);
                }
                else if (fReaderMgr.skippedString(XMLUni::fgPIString))
                {
                    scanPI();
                }
                 else if (fReaderMgr.skippedString(XMLUni::fgCommentString))
                {
                    scanComment();
                }
                 else if (fReaderMgr.skippedString(XMLUni::fgDocTypeString))
                {
                    if (sawDocTypeDecl) {
                        emitError(XMLErrs::DuplicateDocTypeDecl);
                    }
                    scanDocTypeDecl();
                    sawDocTypeDecl = true;

                    // if reusing grammar, this has been validated already in 
first scan
                    // skip for performance
                    if (fValidate && fGrammar && !fGrammar->getValidated()) {
                        //  validate the DTD scan so far
                        fValidator->preContentValidation(fUseCachedGrammar, 
true);
                    }
                }
                else
                {
                    // Assume its the start of the root element
                    return;
                }
            }
            else if (fReaderMgr.getCurrentReader()->isWhitespace(nextCh))
            {
                //  If we have a document handler then gather up the
                //  whitespace and call back. Otherwise just skip over spaces.
                if (fDocHandler)
                {
                    fReaderMgr.getSpaces(bbCData.getBuffer());
                    fDocHandler->ignorableWhitespace
                    (
                        bbCData.getRawBuffer()
                        , bbCData.getLen()
                        , false
                    );
                }
                 else
                {
                    fReaderMgr.skipPastSpaces();
                }
            }
             else
            {
                emitError(XMLErrs::InvalidDocumentStructure);

                // Watch for end of file and break out
                if (!nextCh)
                    break;
                else
                    fReaderMgr.skipPastChar(chCloseAngle);
            }

        }
    }
    catch(const EndOfEntityException&)
    {
        //  We should never get an end of entity here. They should only
        //  occur within the doc type scanning method, and not leak out to
        //  here.
        emitError
        (
            XMLErrs::UnexpectedEOE
            , "in prolog"
        );
    }
}

It is working fine when I move back to version 1.3, but due to various other 
requirements, I have to use the new version 3.1 in my application.

Thanks in advance,
Jojo


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to