On 2017-10-27 21:47, RS wrote:
If you are both right about the strictness of the standard, and I have to defer to your superior knowledge, why does XML::LibXML have options for recovery and validation? According to http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Parser.pod#PARSER_OPTIONS and http://search.cpan.org/dist/XML-LibXML/lib/XML/LibXML/Error.pod it also has a choice of Verbose and Quiet error handlers. Authors can use their own error handlers, or remove the error handler altogether.
The most obvous reason would be to use XML::libXML as a validator, before
releasing files you were then certain were properly formed. I think 'recovery' is this sense merely means the parser returns an error code; there's nothing to suggest that you can then go on and make data-extraction calls against the XML file... you'll just keep getting the error code.
An example given is recovery from a missing closing tag.
- which is no use in this situation when the NUL occurs before any of the
data you're interested in.
I have not seen a definition of fatal error. Is a spurious NUL a fatal error?
I think so, according to that original wikipedia article, because it said that a NUL is one of the only characters that can never be valid in an XMl document.
I suspect it is less serious than a missing closing tag.
Not if the parser knowing it can NEVER be valid stops right there.
It is easy to recover from; you just ignore it.
There's no reason to ignore it. By definition, finding one means that you do not have a valid XML file.
Subject to what anyone may tell me, I would have thought non-matching tags would be more likely to be a fatal error.
Well, HTML - which has looser parsing criteria - does manage that sort of thing. But HTML is not XML.
It must be remembered that an important function of XML, in contrast to other mark up languages, is that it is human readable as well as machine readable.
OTOH the designers of XML clearly felt that well-formedness was just as important.
Error recovery must always be appropriate for the importance of integrity of the data and the probability of errors. I can understand there are applications where strict compliance is necessary, but subtitles does not seem to me to be one of them.
Then take that up with the BBC and tell them that their choice of XML for these files is inappropriate.
Subtitles for this film used to work with XML::Simple. A problem only occurred with the move to XML::LibXML to support coloured subtitles.
Surely the problem is that this specific XML file is corrupted? Are you finding that every single XML file is corrupt? -- Jeremy Nicoll - my opinions are my own _______________________________________________ get_iplayer mailing list get_iplayer@lists.infradead.org http://lists.infradead.org/mailman/listinfo/get_iplayer