Dear Alberto, Thank you for your patience and for all the valuable information.
One final question: The link mentioned in my previous mail describes that by setting 'XMLUni::fgXercesContinueAfterFatalError' to true, the parser's behavior might be *undetermined.* Is the auto-modification (which is being discussed) one such behaviour? Also I would be grateful if you could briefly explain other such behaviours. Regards, Neetha On Wed, Jul 18, 2012 at 1:49 PM, Alberto Massari < [email protected]> wrote: > Il 17/07/2012 08:21, neetha patil ha scritto: > > Dear All, > > Thank you Alberto for guiding me to get rid of the "Unknown element" > validation errors. > > I tried setting the parameter 'XMLUni::fgDOMErrorHandler' for the > DOMBuilder parser but there it had no such parameter and also I am using > the DOM document which is returned after parsing. > > > I forgot that in the new DOM L3 the parameters are set through an > intermediate object. The correct call should be something like > (*pParser)->getDOMConfiguration()->setParameter. (Double check it, I could > remember the name wrong, but that should give you the idea) > > > > DOMBuilder parser (while parsing against the schema) reports the > first schema-related error and continues with further parsing and reporting > of other schema-related errors (if any). Is it possible for the > DOMBuilderparser to behave in the same way (and not do any auto-modification) > when > there are invalid XML statement(s) like the one reported in my previous > mail? > > > No; validation errors are not fatal while invalid XML syntax could be > non-recoverable. In your case the parser tries to find a new > synchronization point at the first ">" it finds, but if you missed the > closing quote at the end of an attribute you would be in much bigger > troubles. > > What I am trying to make you understand is that an invalid XML cannot > generate a DOM representation that reflects the input XML, because by > serializing a DOM representation you will get a *valid* XML, not the > original invalid one. The correct thing to do is reject the input XML you > got; if you want to still be able to read and manipulate it, what you call > "auto-modification" is the only thing you can do. > > Alberto > > > > > Regards, > Neetha > > On Mon, Jul 16, 2012 at 5:03 PM, Alberto Massari < > [email protected]> wrote: > >> Hi Neetha, >> the correct thing to do would be to not make these calls >> >> >> (*pParser)->setFeature( XMLUni::fgXercesSchema, true ); >> (*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking, >> true ); >> (*pParser)->setFeature( XMLUni::fgDOMValidation, true); >> (*pParser)->setFeature( >> XMLUni::fgXercesCacheGrammarFromParse, true); >> >> when bValidate == false, as you are asking to validate against a schema >> that you are not going to provide. This will remove the "Unknown element" >> validation errors. As for what you say it's an "auto-modification", it's >> the correct behaviour: <name="abc"> is not a valid XML statement (either >> there is a missing tag name, and "name" is an attribute, or "name" is the >> element and it's missing a space followed by the attribute name. If you >> force the parser to continue, the DOM tree you get back will be incomplete, >> at best. >> If you really want to get a DOM tree out of that invalid XML, you could >> attach a W3C DOMErrorHandler (different from the one you provided) using >> (*pParser)->setParameter(XMLUni::fgDOMErrorHandler, domErrorHandlerVar) >> This class has a handleError method where you can check what happened by >> examining the DOMError argument, and the DOMLocation inside it (it contains >> the DOM node where the error was located). If you return "true", the parser >> will try continuing the parse process; if you return "false", parsing will >> be aborted. >> >> Alberto >> >> >> Il 16/07/2012 12:06, neetha patil ha scritto: >> >> Dear Alberto, >> >> Thank you for the quick reply. >> >> As I do not load the grammar (schema) to the parser, it gives error like >> "Unknown element.." etc., for all the XML tags until it hits the invalid >> tag for which it gives the error 'Expected an attribute name' and aborts >> parsing as you mentioned. >> >> So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to true >> and got the complete file parsed. However the line containing the invalid >> tag was modified as follows:- >> ... >> ... >> <Services> >> ... >> ... >> </Services> >> ... >> <name> >> ... >> ... >> </name> >> ... >> ... >> >> As it is told in http://xml.apache.org/xerces-c-new/program-dom.html that >> setting this feature to true might result in an *undetermined* behavior >> of the parser, is there any other way for the parser to report the error >> and continue parsing? Also can we prevent the auto-modification (in this >> case, the modification from <name="abc"> to <name>)? >> >> Thanks >> >> Regards, >> Neetha >> >> On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari < >> [email protected]> wrote: >> >>> Hi, >>> Xerces doesn't modify your document; you should check the error handler >>> to see if the parsing was aborted because of an error. In this case the >>> returned DOM tree would be complete up to position of the error. >>> >>> Alberto >>> >>> Il 16/07/2012 10:25, neetha patil ha scritto: >>> >>> Dear All, >>> >>> I am using Xercesc_2_8 C++. I provide a XML file (containing an invalid >>> tag) to the >>> DOMBuilder parser. I then edit the DOM document which is generated and >>> save the document back to the XML file. The content of this file is now >>> truncated from the invalid tag onwards. Why does the parser modify the file >>> while parsing? How do I prevent the same? i.e., I want the parser to report >>> the error and continue parsing but not modify the XML content. >>> Following is the snapshot of the XML file:- >>> ... >>> ... >>> <Header id="My Project Id" nameStructure="DevName" revision="0" >>> version="1"> >>> ... >>> </Header> >>> ... >>> ... >>> <Services> >>> ... >>> ... >>> </Services> >>> <!-- Invalid tag: No node name --> >>> <name="abc"> >>> ... >>> ... >>> Following is the code snippet of the parser:- >>> *void CHelper::InitDOM() >>> *{ >>> // m_pDomImpl is a pointer to DOMImplementation >>> m_pDomImpl = 0; >>> if(m_pDomImpl == NULL) >>> { >>> XMLPlatformUtils::Initialize(); >>> m_pDomImpl = >>> DOMImplementationRegistry::getDOMImplementation( gLS ); >>> } >>> } >>> >>> *int CHelper::LoadFile(DOMBuilder** pParser, const CString& strXMLFile, >>> DOMDocument** pDoc, CStringArray& arrError, bool bValidate, const >>> CString& strSchemaFile) >>> *{ >>> ... >>> if(*pParser == NULL) >>> { >>> *pParser = >>> ((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder >>> >>> (DOMImplementationLS::MODE_SYNCHRONOUS, >>> 0 ); >>> if((*pParser) ==NULL) >>> { >>> return DOM_INITIALIZE_FAILED; >>> } >>> >>> (*pParser)->setFeature( XMLUni::fgDOMNamespaces, true ); >>> (*pParser)->setFeature( XMLUni::fgXercesSchema, true ); >>> (*pParser)->setFeature( >>> XMLUni::fgXercesSchemaFullChecking, true ); >>> (*pParser)->setFeature( XMLUni::fgDOMValidation, true); >>> (*pParser)->setFeature( >>> XMLUni::fgXercesCacheGrammarFromParse, true); >>> } >>> >>> try >>> { >>> CMyDOMErrHandler eh(); >>> m_arrValidationErrs.RemoveAll(); >>> >>> // parseURI a blocking call. All the errors will be >>> reported first if any error handler is set >>> // then only the next line will be executed. >>> if(bValidate == true) >>> { >>> (*pParser)->setErrorHandler(&eh); >>> (*pParser)->loadGrammar( strSchemaFile, >>> Grammar::SchemaGrammarType, true); >>> } >>> else >>> { >>> (*pParser)->setErrorHandler(NULL); >>> } >>> *pDoc =(*pParser)->parseURI(strXMLFile); >>> ... >>> ... >>> } >>> catch(...) >>> { >>> ... >>> } >>> >>> return SUCCESS; >>> >>> } >>> >>> Thank you in advance. >>> Regards, >>> Neetha >>> >>> >>> >>> >> >> >> > > >
