Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

neetha patil Mon, 16 Jul 2012 03:06:51 -0700

Dear Alberto,

Thank you for the quick reply.


As I do not load the grammar (schema) to the parser, it gives error like
"Unknown element.." etc., for all the XML tags until it hits the invalid
tag for which it gives the error 'Expected an attribute name' and aborts
parsing as you mentioned.

So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to true and
got the complete file parsed. However the line containing the invalid tag
was modified as follows:-
...
...
 <Services>
     ...
     ...
</Services>
...
<name>
...
...
</name>
...
...

As it is told in http://xml.apache.org/xerces-c-new/program-dom.html that
setting this feature to true might result in an *undetermined* behavior of
the parser, is there any other way for the parser to report the error and
continue parsing? Also can we prevent the auto-modification (in this case,
the modification from <name="abc"> to <name>)?

Thanks

Regards,
Neetha

On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari <
[email protected]> wrote:

>  Hi,
> Xerces doesn't modify your document; you should check the error handler to
> see if the parsing was aborted because of an error. In this case the
> returned DOM tree would be complete up to position of the error.
>
> Alberto
>
> Il 16/07/2012 10:25, neetha patil ha scritto:
>
>  Dear All,
>
> I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid
> tag) to the
> DOMBuilder parser. I then edit the DOM document which is generated and
> save the document back to the XML file. The content of this file is now
> truncated from the invalid tag onwards. Why does the parser modify the file
> while parsing? How do I prevent the same? i.e., I want the parser to report
> the error and continue parsing but not modify the XML content.
> Following is the snapshot of the XML file:-
> ...
> ...
> <Header id="My Project Id" nameStructure="DevName" revision="0"
> version="1">
>      ...
> </Header>
>      ...
>      ...
> <Services>
>      ...
>      ...
> </Services>
> <!-- Invalid tag: No node name -->
> <name="abc">
> ...
> ...
>  Following is the code snippet of the parser:-
> *void CHelper::InitDOM()
> *{
>         // m_pDomImpl is a pointer to DOMImplementation
>         m_pDomImpl = 0;
>         if(m_pDomImpl == NULL)
>         {
>               XMLPlatformUtils::Initialize();
>               m_pDomImpl =
> DOMImplementationRegistry::getDOMImplementation( gLS );
>          }
> }
>
> *int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile,
> DOMDocument** pDoc, CStringArray&     arrError, bool bValidate, const
> CString& strSchemaFile)
> *{
>        ...
>        if(*pParser == NULL)
>        {
>               *pParser =
> ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
>                                                                               
>                    (DOMImplementationLS::MODE_SYNCHRONOUS,
>  0 );
>                if((*pParser) ==NULL)
>               {
>                     return DOM_INITIALIZE_FAILED;
>               }
>
>               (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
>               (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
>               (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
> true );
>               (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
>               (*pParser)->setFeature(
> XMLUni::fgXercesCacheGrammarFromParse, true);
>        }
>
>        try
>        {
>               CMyDOMErrHandler eh();
>               m_arrValidationErrs.RemoveAll();
>
>               // parseURI a blocking call. All the errors will be reported
> first if any error handler is set
>               // then only the next line will be executed.
>               if(bValidate == true)
>              {
>                    (*pParser)->setErrorHandler(&eh);
>                    (*pParser)->loadGrammar( strSchemaFile,
> Grammar::SchemaGrammarType, true);
>              }
>              else
>              {
>                     (*pParser)->setErrorHandler(NULL);
>              }
>              *pDoc =(*pParser)->parseURI(strXMLFile);
>              ...
>              ...
>       }
>       catch(...)
>       {
>             ...
>       }
>
>       return SUCCESS;
>
> }
>
> Thank you in advance.
> Regards,
> Neetha
>
>
>
>

Re: Fwd: Why does Xerces modify an invalid XML file while parsing?

Reply via email to