Hi Peter,

I discovered yesterday that there's a file name conflict between
css-backgrounds-3/border-image-slice-001.xht and
css-backgrounds-3/border-image-slice-001.htm that isn't caught by the
build system.

It turned out that border-image-slice-001.htm (which is encoded in
utf-16-le) was being parsed as windows-1252, so no elements were
recognized and the file was dropped as "not a test". The file wasn't
detected as utf-16-le in HTMLSource.parse because of the encoding
handling there.

As HTMLBinaryInputStream.__init__ already calls detectEncoding(), the
UTF-16 BOM is no longer in the stream when HTMLSource.parse calls
detectEncoding() manually. This causes detectEncoding() not to find
anything interesting, and return windows-1252. Attached is a patch to
remove the manual handling, instead depending on HTMLParser.parse to
handle the encoding detection itself.

Could you apply the patch to <https://hg.csswg.org/dev/w3ctestlib>? I
don't believe I have push access myself.

Thanks
Ms2ger

Attachment: w3ctestlib-encoding.diff
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to