On 27/10/2017 21:47, RS wrote:
On 27/10/2017 19:06, Bernard Peek wrote:

It is then up to the calling script (get_iplayer.pl) to decide what
action to take in response the action taken by the parser. It is not
adequate just to allow XML::LibXML to display "parser error" and take
no further action.

Even though that's what the XML standard says IS the correct action?


PMFJI

I built data transfer standards for the UK's outdoor advertising industry. I deliberately chose to use XML based standards because it enabled automatic validation of data files. The standards were quite specific. All automated systems were required to refuse any files not compatible with the DTD I had on my web server. Data providers were expected to prevalidate any files they sent to any other company.

This was my main argument for switching to XML from flat-files.

If you are both right about the strictness of the standard, and I have to defer to your superior knowledge, why does XML::LibXML have options for recovery and validation?

If you are particularly masochistic you can write code to recover data from files that you already know are corrupt. Sometimes you can't just throw the problem back at the data provider. The nice thing about failing to validate is that it's a boolean value. It unambiguously points the finger of blame at the data provider. Whether you can use that to force them to fix the problem is a political issue not a technical one.

According to
http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod#PARSER_OPTIONS and
http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Error.pod
it also has a choice of Verbose and Quiet error handlers.  Authors can use their own error handlers, or remove the error handler altogether. An example given is recovery from a missing closing tag.  I have not seen a definition of fatal error.  Is a spurious NUL a fatal error?  I suspect it is less serious than a missing closing tag.  It is easy to recover from; you just ignore it. Subject to what anyone may tell me, I would have thought non-matching tags would be more likely to be a fatal error.

It must be remembered that an important function of XML, in contrast to other mark up languages, is that it is human readable as well as machine readable.


Making XML human-readable was a compromise. The drawback is that it encourages tinkerers to believe that they can or should attempt to fix problems when, in most cases, the only sensible thing to do is kick them back to the provider. What you end up with is multiple people in different places putting in lots of time fixing someone else's mistakes. Allowing that to continue is a disservice to other data users and should be a last resort. Just because something is doable doesn't make doing it a good idea.

--
Bernard Peek
b...@shrdlu.com


_______________________________________________
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer

Reply via email to