> From: Keith Robertson

> Sent: Wednesday, February 27, 2013 12:13 PM
> 

> Is there anything that can be done to "relax" processing of XML
> documents that may contain invalid XML?  I am specifically thinking
> about cases where the XML document contains <> within another element
> and the creator of the XML either didn't escape them or surround them
> with CDATA.
> 

Keith -

Just so that we have our terminology consistent, to say that an XML
document is invalid usually means that an attempt has been made to
validate the document against the XML Schema (or DTD) for that
document's document type and that validation attempt has failed.

What you are talking about in the case of mis-placed corner brackets
is referred to being not well-formed XML.  When an XML document is
not well-formed, usually that means that an XML parser will not
accept it, and maybe even required to reject it.  For more on this,
see: http://en.wikipedia.org/wiki/Well-formed_document

The code generated by generateDS.py uses the ElementTree or Lxml
parsers to read in input XML documents.  Take a look at the parseXXX
functions generated at the bottom of the generated python module.

So, at the least, you need to get those parsers (ElementTree or
Lxml) to accept your document.

If you think that you know how to do some automatic code clean-up,
then you might try writing a Python script, and pre-process your
documents with that.

By the way, the decision to require that XML parsers not accept
documents that are ill-formed, seems to be a conscious one.  I
believe that there was quite a bit of discussion and controversy
about that decision.  One rational is that because of this
requirement that XML parsers reject documents that are not
well-formed, you will be forced to push-back against the produces of
the document and "encourage" them to fix those documents.  You will
have to make up your own mind about this policy.  But, basically,
it's not optional.

- Dave


 
--

Dave Kuhlman
http://www.rexx.com/~dkuhlman 


------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_feb
_______________________________________________
generateds-users mailing list
generateds-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/generateds-users

Reply via email to