On 27/10/2017 21:47, RS wrote:
On 27/10/2017 19:06, Bernard Peek wrote:
It is then up to the calling script (get_iplayer.pl) to decide what
action to take in response the action taken by the parser. It is not
adequate just to allow XML::LibXML to display "parser error" and take
no further action.
Even though that's what the XML standard says IS the correct action?
PMFJI
I built data transfer standards for the UK's outdoor advertising
industry. I deliberately chose to use XML based standards because it
enabled automatic validation of data files. The standards were quite
specific. All automated systems were required to refuse any files not
compatible with the DTD I had on my web server. Data providers were
expected to prevalidate any files they sent to any other company.
This was my main argument for switching to XML from flat-files.
If you are both right about the strictness of the standard, and I have
to defer to your superior knowledge, why does XML::LibXML have options
for recovery and validation?
If you are particularly masochistic you can write code to recover data
from files that you already know are corrupt. Sometimes you can't just
throw the problem back at the data provider. The nice thing about
failing to validate is that it's a boolean value. It unambiguously
points the finger of blame at the data provider. Whether you can use
that to force them to fix the problem is a political issue not a
technical one.
According to
http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod#PARSER_OPTIONS
and
http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Error.pod
it also has a choice of Verbose and Quiet error handlers. Authors can
use their own error handlers, or remove the error handler altogether.
An example given is recovery from a missing closing tag. I have not
seen a definition of fatal error. Is a spurious NUL a fatal error? I
suspect it is less serious than a missing closing tag. It is easy to
recover from; you just ignore it. Subject to what anyone may tell me,
I would have thought non-matching tags would be more likely to be a
fatal error.
It must be remembered that an important function of XML, in contrast
to other mark up languages, is that it is human readable as well as
machine readable.
Making XML human-readable was a compromise. The drawback is that it
encourages tinkerers to believe that they can or should attempt to fix
problems when, in most cases, the only sensible thing to do is kick them
back to the provider. What you end up with is multiple people in
different places putting in lots of time fixing someone else's mistakes.
Allowing that to continue is a disservice to other data users and should
be a last resort. Just because something is doable doesn't make doing it
a good idea.
--
Bernard Peek
b...@shrdlu.com
_______________________________________________
get_iplayer mailing list
get_iplayer@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/get_iplayer