Il 17/07/2012 08:21, neetha patil ha scritto:
Dear All,
Thank you Alberto for guiding me to get rid of the "Unknown element"
validation errors.
I tried setting the parameter 'XMLUni::fgDOMErrorHandler' for the
DOMBuilderparser but there it had no such parameter and also I am
using the DOM document which is returned after parsing.
I forgot that in the new DOM L3 the parameters are set through an
intermediate object. The correct call should be something like
(*pParser)->getDOMConfiguration()->setParameter. (Double check it, I
could remember the name wrong, but that should give you the idea)
DOMBuilderparser (while parsing against the schema) reports the
first schema-related error and continues with further parsing and
reporting of other schema-related errors (if any). Is it possible for
the DOMBuilderparser to behave in the same way (and not do any
auto-modification) when there are invalid XML statement(s) like the
one reported in my previous mail?
No; validation errors are not fatal while invalid XML syntax could be
non-recoverable. In your case the parser tries to find a new
synchronization point at the first ">" it finds, but if you missed the
closing quote at the end of an attribute you would be in much bigger
troubles.
What I am trying to make you understand is that an invalid XML cannot
generate a DOM representation that reflects the input XML, because by
serializing a DOM representation you will get a *valid* XML, not the
original invalid one. The correct thing to do is reject the input XML
you got; if you want to still be able to read and manipulate it, what
you call "auto-modification" is the only thing you can do.
Alberto
Regards,
Neetha
On Mon, Jul 16, 2012 at 5:03 PM, Alberto Massari
<[email protected] <mailto:[email protected]>>
wrote:
Hi Neetha,
the correct thing to do would be to not make these calls
(*pParser)->setFeature( XMLUni::fgXercesSchema, true );
(*pParser)->setFeature(
XMLUni::fgXercesSchemaFullChecking, true );
(*pParser)->setFeature( XMLUni::fgDOMValidation, true);
(*pParser)->setFeature(
XMLUni::fgXercesCacheGrammarFromParse, true);
when bValidate == false, as you are asking to validate against a
schema that you are not going to provide. This will remove the
"Unknown element" validation errors. As for what you say it's an
"auto-modification", it's the correct behaviour: <name="abc"> is
not a valid XML statement (either there is a missing tag name, and
"name" is an attribute, or "name" is the element and it's missing
a space followed by the attribute name. If you force the parser to
continue, the DOM tree you get back will be incomplete, at best.
If you really want to get a DOM tree out of that invalid XML, you
could attach a W3C DOMErrorHandler (different from the one you
provided) using
(*pParser)->setParameter(XMLUni::fgDOMErrorHandler,
domErrorHandlerVar)
This class has a handleError method where you can check what
happened by examining the DOMError argument, and the DOMLocation
inside it (it contains the DOM node where the error was located).
If you return "true", the parser will try continuing the parse
process; if you return "false", parsing will be aborted.
Alberto
Il 16/07/2012 12:06, neetha patil ha scritto:
Dear Alberto,
Thank you for the quick reply.
As I do not load the grammar (schema) to the parser, it gives
error like "Unknown element.." etc., for all the XML tags until
it hits the invalid tag for which it gives the error 'Expected an
attribute name' and aborts parsing as you mentioned.
So I set the feature 'XMLUni::fgXercesContinueAfterFatalError' to
true and got the complete file parsed. However the line
containing the invalid tag was modified as follows:-
...
...
<Services>
...
...
</Services>
...
<name>
...
...
</name>
...
...
As it is told in
http://xml.apache.org/xerces-c-new/program-dom.html that setting
this feature to true might result in an *undetermined* behavior
of the parser, is there any other way for the parser to report
the error and continue parsing? Also can we prevent the
auto-modification (in this case, the modification from
<name="abc"> to <name>)?
Thanks
Regards,
Neetha
On Mon, Jul 16, 2012 at 2:39 PM, Alberto Massari
<[email protected]
<mailto:[email protected]>> wrote:
Hi,
Xerces doesn't modify your document; you should check the
error handler to see if the parsing was aborted because of an
error. In this case the returned DOM tree would be complete
up to position of the error.
Alberto
Il 16/07/2012 10:25, neetha patil ha scritto:
Dear All,
I am using Xercesc_2_8 C++. I provide a XML file (containing
an invalid tag) to the
DOMBuilderparser. I then edit the DOM document which is
generated and save the document back to the XML file. The
content of this file is now truncated from the invalid tag
onwards. Why does the parser modify the file while parsing?
How do I prevent the same? i.e., I want the parser to report
the error and continue parsing but not modify the XML content.
Following is the snapshot of the XML file:-
...
...
<Header id="My Project Id" nameStructure="DevName"
revision="0" version="1">
...
</Header>
...
...
<Services>
...
...
</Services>
<!-- Invalid tag: No node name -->
<name="abc">
...
...
Following is the code snippet of the parser:-
*void CHelper::InitDOM()
*{
// m_pDomImpl is a pointer to DOMImplementation
m_pDomImpl = 0;
if(m_pDomImpl == NULL)
{
XMLPlatformUtils::Initialize();
m_pDomImpl =
DOMImplementationRegistry::getDOMImplementation( gLS );
}
}
*int CHelper::LoadFile(DOMBuilder** pParser, const CString&
strXMLFile, DOMDocument** pDoc, CStringArray& arrError,
bool bValidate, const CString& strSchemaFile)
*{
...
if(*pParser == NULL)
{
*pParser =
((DOMImplementationLS*)m_pDomImpl)->createDOMBuilder
(DOMImplementationLS::MODE_SYNCHRONOUS,
0 );
if((*pParser) ==NULL)
{
return DOM_INITIALIZE_FAILED;
}
(*pParser)->setFeature( XMLUni::fgDOMNamespaces, true );
(*pParser)->setFeature( XMLUni::fgXercesSchema, true );
(*pParser)->setFeature( XMLUni::fgXercesSchemaFullChecking,
true );
(*pParser)->setFeature( XMLUni::fgDOMValidation, true);
(*pParser)->setFeature(
XMLUni::fgXercesCacheGrammarFromParse, true);
}
try
{
CMyDOMErrHandler eh();
m_arrValidationErrs.RemoveAll();
// parseURI a blocking call. All the errors
will be reported first if any error handler is set
// then only the next line will be executed.
if(bValidate == true)
{
(*pParser)->setErrorHandler(&eh);
(*pParser)->loadGrammar( strSchemaFile,
Grammar::SchemaGrammarType, true);
}
else
{
(*pParser)->setErrorHandler(NULL);
}
*pDoc =(*pParser)->parseURI(strXMLFile);
...
...
}
catch(...)
{
...
}
return SUCCESS;
}
Thank you in advance.
Regards,
Neetha