On Tue, Sep 22, 2015 at 12:13 PM, Ms2ger <ms2...@gmail.com> wrote: > As HTMLBinaryInputStream.__init__ already calls detectEncoding(), the > UTF-16 BOM is no longer in the stream when HTMLSource.parse calls > detectEncoding() manually. This causes detectEncoding() not to find > anything interesting, and return windows-1252. Attached is a patch to > remove the manual handling, instead depending on HTMLParser.parse to > handle the encoding detection itself. > > Could you apply the patch to <https://hg.csswg.org/dev/w3ctestlib>? I > don't believe I have push access myself.
FWIW, detectEncoding should never be called manually; it'll be called if no encoding is specified. Somebody (likely me) really needs to do something about the html5lib docs… /g