http://dev.w3.org/html5/spec/Overview.html#determining-the-character-encoding specifies how to pre-scan an HTML document to sniff the charset. Would it not be simpler to just implement the algorithm as specified instead of using a generic parser. The use of HTML::Parser to implement this sniffing was just me trying a shortcut since HTML::Parser seemed to implement a superset of these rules.
--Gisle