I just noticed that section 2.7.1 of HTML5 says:

  Extensions must not be used for determining resource types
  for resources fetched over HTTP.

While I understand the reasons for this, there are certainly cases where this will break sites (basically those using HTTP 0.9, or later HTTP versions but not sending a content-type). In particular, the HTML sniffing in the algorithm is very limited and wouldn't sniff this document:

  <body>Some text</body>

as HTML.

Now this use case (no content-type at all) was pretty common when the unknown type sniffer in Gecko was written, but that was years ago. Do we have any data on how common it is now?

-Boris

P.S. Of course at the moment the sniffer in Gecko is used for more than just HTTP, and it looks like we'll need separate modes for things like HTTP and things like file://. I can live with that, though. For the file:// case detection of HTML in documents with no doctype/<html>/<head> is a must.

Reply via email to