http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding
specifies how to pre-scan an HTML document to sniff the charset.
Would it not be simpler to just implement the algorithm as specified
instead of using a generic parser.  The use of HTML::Parser to
implement this sniffing was just me trying a shortcut since
HTML::Parser seemed to implement a superset of these rules.

--Gisle

Reply via email to