On 1/16/2012 6:53 PM, Bjoern Hoehrmann wrote: > * Christopher J. Madsen wrote: >> My repo is https://github.com/madsen/io-html but since it's built with >> dzil, I also made a gist of the processed module to make it easier to >> read the docs: https://gist.github.com/1623654
> It is not clear to me that the combination would actually conform to the > "HTML5" proposal, for instance, HTTP::Message seems to recognize UTF-32 > BOMs, but as I recall the "HTML5" proposal does not allow that. Dropping support for UTF-32 from HTTP::Message is a separate issue from removing HTML::Parser. I've got no comment on that. > Your UTF-8 validation code seems wrong to me, you consider the sequence > F0 80 to be incomplete, but it's actually invalid, same for ED 80, see > the chart in <http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#design>. I guess the RE could be improved, but I'm not sure it's worth the effort and added complication to catch a tiny fraction of false positives. > Anyway, if people think this is the way to go, maybe HTTP::Message can > adopt the Content-Type header charset extraction tests in HTML::Encoding > so they don't get lost as my module becomes redundant? I thought it already did that? -- Chris Madsen p...@cjmweb.net -------------------- http://www.cjmweb.net --------------------