On Thursday, 18 February 2016 at 17:26:30 UTC, Adam D. Ruppe
wrote:
On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner
Schadek wrote:
unix file says it is a utf8 encoded file, but not BOM is
present.
the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
Gah, I should have read this before replying... well, that does
appear to be valid utf-8.... why is it throwing an exception
then?
I'm pretty sure that byte stream *is* actually well-formed xml
1.0 and should pass utf validation as well as the XML
well-formedness check.
Regarding control characters: If you give me a complete sample
file, I can run it through Mozilla's UTF stream conversion and/or
XML parsing code (via either SAX or DOMParser) to tell you how
that reacts as a reference. Mozilla supports XML 1.0, but not
1.1.