On 1/16/2012 6:53 PM, Bjoern Hoehrmann wrote:
> * Christopher J. Madsen wrote:
>> My repo is https://github.com/madsen/io-html but since it's built with
>> dzil, I also made a gist of the processed module to make it easier to
>> read the docs: https://gist.github.com/1623654

> It is not clear to me that the combination would actually conform to the
> "HTML5" proposal, for instance, HTTP::Message seems to recognize UTF-32
> BOMs, but as I recall the "HTML5" proposal does not allow that.

Dropping support for UTF-32 from HTTP::Message is a separate issue from
removing HTML::Parser.  I've got no comment on that.

> Your UTF-8 validation code seems wrong to me, you consider the sequence
> F0 80 to be incomplete, but it's actually invalid, same for ED 80, see
> the chart in <http://bjoern.hoehrmann.de/utf-8/decoder/dfa/#design>.

I guess the RE could be improved, but I'm not sure it's worth the effort
and added complication to catch a tiny fraction of false positives.

> Anyway, if people think this is the way to go, maybe HTTP::Message can
> adopt the Content-Type header charset extraction tests in HTML::Encoding
> so they don't get lost as my module becomes redundant?

I thought it already did that?

-- 
Chris Madsen                                          p...@cjmweb.net
  --------------------  http://www.cjmweb.net  --------------------

Reply via email to