Il 17/07/2012 08:21, neetha patil ha scritto:
Dear All,
Thank you Alberto for guiding me to get rid of the "Unknown element" validation errors. I tried setting the parameter 'XMLUni::fgDOMErrorHandler' for the DOMBuilderparser but there it had no such parameter and also I am using the DOM document which is returned after parsing.

I forgot that in the new DOM L3 the parameters are set through an intermediate object. The correct call should be something like (*pParser)->getDOMConfiguration()->setParameter. (Double check it, I could remember the name wrong, but that should give you the idea)

DOMBuilderparser (while parsing against the schema) reports the first schema-related error and continues with further parsing and reporting of other schema-related errors (if any). Is it possible for the DOMBuilderparser to behave in the same way (and not do any auto-modification) when there are invalid XML statement(s) like the one reported in my previous mail?

No; validation errors are not fatal while invalid XML syntax could be non-recoverable. In your case the parser tries to find a new synchronization point at the first ">" it finds, but if you missed the closing quote at the end of an attribute you would be in much bigger troubles.

What I am trying to make you understand is that an invalid XML cannot generate a DOM representation that reflects the input XML, because by serializing a DOM representation you will get a *valid* XML, not the original invalid one. The correct thing to do is reject the input XML you got; if you want to still be able to read and manipulate it, what you call "auto-modification" is the only thing you can do.

Alberto


Regards,
Neetha

On Mon, Jul 16, 2012 at 5:03 PM, Alberto Massari <[email protected] <mailto:[email protected]>> wrote:

    Hi Neetha,
    the correct thing to do would be to not make these calls


    (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
                  (*pParser)->setFeature(
    XMLUni::fgXercesSchemaFullChecking, true );
                  (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
                  (*pParser)->setFeature(
    XMLUni::fgXercesCacheGrammarFromParse, true);

    when bValidate == false, as you are asking to validate against a
    schema that you are not going to provide. This will remove the
    "Unknown element" validation errors. As for what you say it's an
    "auto-modification", it's the correct behaviour: <name="abc"> is
    not a valid XML statement (either there is a missing tag name, and
    "name" is an attribute, or "name" is the element and it's missing
    a space followed by the attribute name. If you force the parser to
    continue, the DOM tree you get back will be incomplete, at best.
    If you really want to get a DOM tree out of that invalid XML, you
    could attach a W3C DOMErrorHandler (different from the one you
    provided) using
    (*pParser)->setParameter(XMLUni::fgDOMErrorHandler,
    domErrorHandlerVar)
    This class has a handleError method where you can check what
    happened by examining the DOMError argument, and the DOMLocation
    inside it (it contains the DOM node where the error was located).
    If you return "true", the parser will try continuing the parse
    process; if you return "false", parsing will be aborted.

    Alberto


    Il 16/07/2012 12:06, neetha patil ha scritto:
    Dear Alberto,
    Thank you for the quick reply.
    As I do not load the grammar (schema) to the parser, it gives
    error like "Unknown element.." etc., for all the XML tags until
    it hits the invalid tag for which it gives the error 'Expected an
    attribute name' and aborts parsing as you mentioned.
    So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to
    true and got the complete file parsed. However the line
    containing the invalid tag was modified as follows:-
    ...
    ...
    <Services>
         ...
         ...
    </Services>
    ...
    <name>
    ...
    ...
    </name>
    ...
    ...
    As it is told in
    http://xml.apache.org/xerces-c-new/program-dom.html that setting
    this feature to true might result in an *undetermined* behavior
    of the parser, is there any other way for the parser to report
    the error and continue parsing? Also can we prevent the
    auto-modification (in this case, the modification from
    <name="abc"> to <name>)?
    Thanks
    Regards,
    Neetha

    On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari
    <[email protected]
    <mailto:[email protected]>> wrote:

        Hi,
        Xerces doesn't modify your document; you should check the
        error handler to see if the parsing was aborted because of an
        error. In this case the returned DOM tree would be complete
        up to position of the error.

        Alberto

        Il 16/07/2012 10:25, neetha patil ha scritto:
        Dear All,

        I am using Xercesc_2_8 C++. I provide a XML file (containing
        an invalid tag) to the

        DOMBuilderparser. I then edit the DOM document which is
        generated and save the document back to the XML file. The
        content of this file is now truncated from the invalid tag
        onwards. Why does the parser modify the file while parsing?
        How do I prevent the same? i.e., I want the parser to report
        the error and continue parsing but not modify the XML content.
        Following is the snapshot of the XML file:-
        ...
        ...
        <Header id="My Project Id" nameStructure="DevName"
        revision="0" version="1">
        ...
        </Header>
             ...
             ...
        <Services>
        ...
        ...
        </Services>
        <!-- Invalid tag: No node name -->
        <name="abc">
        ...
        ...
         
        Following is the code snippet of the parser:-
        *void CHelper::InitDOM()
        *{
                // m_pDomImpl is a pointer to DOMImplementation
                m_pDomImpl = 0;
                if(m_pDomImpl == NULL)
        {
        XMLPlatformUtils::Initialize();
                      m_pDomImpl =
        DOMImplementationRegistry::getDOMImplementation( gLS );
                 }
        }
        *int CHelper::LoadFile(DOMBuilder** pParser, const CString&
        strXMLFile, DOMDocument** pDoc, CStringArray&     arrError,
        bool bValidate, const CString& strSchemaFile)
        *{
               ...
               if(*pParser == NULL)
               {
                      *pParser =
        ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
                                                                                
                         (DOMImplementationLS::MODE_SYNCHRONOUS,
         0 );
                       if((*pParser) ==NULL)
                      {
                            return DOM_INITIALIZE_FAILED;
                      }

        (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
        (*pParser)->setFeature( XMLUni::fgXercesSchema, true );
        (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
        true );
        (*pParser)->setFeature( XMLUni::fgDOMValidation, true);
        (*pParser)->setFeature(
        XMLUni::fgXercesCacheGrammarFromParse, true);
               }

               try
               {
                      CMyDOMErrHandler eh();
        m_arrValidationErrs.RemoveAll();

                      // parseURI a blocking call. All the errors
        will be reported first if any error handler is set
                      // then only the next line will be executed.
                      if(bValidate == true)
                     {
                           (*pParser)->setErrorHandler(&eh);
                           (*pParser)->loadGrammar( strSchemaFile,
        Grammar::SchemaGrammarType, true);
                     }
                     else
                     {
        (*pParser)->setErrorHandler(NULL);
                     }
                     *pDoc =(*pParser)->parseURI(strXMLFile);
                     ...
        ...
              }
              catch(...)
              {
                    ...
              }

              return SUCCESS;

        }

        Thank you in advance.

        Regards,
        Neetha









Reply via email to