Re: [whatwg] text/html for html and xhtml
If the server infers the MIME type from content and sends it over HTTP as it should, you can have both. Chris -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Boris Zbarsky Sent: Saturday, April 19, 2008 6:10 AM To: William F Hammond Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED] Subject: Re: [whatwg] text/html for html and xhtml William F Hammond wrote: Or, if that is too hard or too politically difficult, going forward the WG should provide a formula for the front of a document that asks for an xhtml parse. What is the benefit over using a MIME type as now, though? -Boris
Re: [whatwg] text/html for html and xhtml
Křištof Želechovski wrote: If the server infers the MIME type from content and sends it over HTTP as it should, you can have both. Changing servers (including getting existing installs updated) is even more painful than changing browsers, though. It would be very nice if servers had better MIME type handling, but the reality is that they don't, and likely won't any time in the next several (5+, I would guess) years. I'd love to be proved a hopeless pessimist on this point, of course. ;) -Boris
Re: [whatwg] text/html for html and xhtml
William F Hammond wrote: 1. Many search engines appear not to look at application/xhtml+xml. That seems like a much simpler thing to fix in search engines than in the specification and UAs, to be honest. I don't see this as a compelling reason to add complexity to the parsing model. 2. Many content providers have reported that they are stranded, i.e., their contractors who receive the content by upload for subsequent placement under the eye of an http server do not support application/xhtml+xml. This is the argument for any type of content-type sniffing, no? By this argument, why bother with MIME types at all? (And, of course, text/xml and application/xml are non-specific mimetypes for which there is no base namespace. They are sane content channels for web browsers only when display is entirely controlled with something like CSS.) Uh... Have you tested this? As I recall there are no major layout/rendering differences in Gecko, Opera, and Safari between an XHTML document sent as application/xhtml+xml and one sent as application/xml. In both cases, it needs to have the XHTML namespace on all the nodes to be handled correctly. There are differences in terms of the DOM: the document doesn't necessarily implement the HTMLDocument interface. HTML5 proposes to change that so that all Documents implement HTMLDocument if the UA supports HTMLDocument at all. At that point it really won't matter whether XHTML is served as application/xhtml+xml from a DOM point of view. There might be a new behavior difference introduced if the body background special-casing in CSS is extended to apply to application/xhtml+xml like it applies to text/html now. But I'd hardly call application/xml delivery of XHTML insane now, and even less so after the HTMLDocument change is made. If you're talking about UAs other than those three that support application/xhtml+xml, I'll admit to not knowing what the situation is with those. -Boris -Boris
Re: [whatwg] text/html for html and xhtml
William F Hammond wrote: Perhaps you should clearly state your definitions of bad and good in this case? I'd also like to know, given those definitions, why it's bad for the bad documents to drive out the good, and how you think your proposal will prevent that from happening. Good and bad here apply to document instances. Good means compliant xhtml+(mathml|svg)*; bad, as I casually used it, means other. OK. My only point is that a user agent should parse as xml a document whose preamble indicates xhtml even when the mimetype is text/html. That would break a large fraction of popular websites out there. In addition detecting the preamble requires assumption of a parsing model. I'm pretty sure one can construct documents that have different preambles when treated as HTML and XML. Or, if that is too hard or too politically difficult, going forward the WG should provide a formula for the front of a document that asks for an xhtml parse. What is the benefit over using a MIME type as now, though? -Boris
Re: [whatwg] text/html for html and xhtml (Was: Supporting MathML and SVG in text/html, and related topics)
On 17/04/2008, William F Hammond [EMAIL PROTECTED] wrote: Previously: Yes, but the point is, once a user agent begins to sniff, there's no rational excuse for it not to recognize compliant xhtml+(mathml|svg). Yes there is. Live content rely on even perfectly well formed XHTML to have the HTML behaviours of CSS and the DOM. It also relies on all elements having #PCDATA content. Thus scripts and style sheets would be given an incompatible parsing that changes the meaning of '', '' and XML comments within scripts, just to take one example. That is, a script which is well formed and valid XML and which is XML well formedness-compatible and proper HTML may have entirely textual content. (The subset of live XHTML content that uses embedded scripts which are also XML well formed without using explicit CDATA wrapping is very small, though.) What obstacles to this exist? The Web. Really!?! Really. And then: The Web. Really!?! Yes, see for instance: http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html Taylor's comment is mainly about what happens when a user agent confuses tag soup with good xhtml. It is a different question how a user agent decides what it is looking at. Whether there is one mimetype or two, erroneous content will need handling. The experiment begun around 2001 of punishing bad documents in application/xhtml+xml seems to have led to that mime type not being much used. We don't know how big a factor the draconianness of XML parsing really is. The fact is, the single biggest consumer of those documents has not begun supporting XHTML yet. Internet Explorer supports HTML and XML but not the XHTML namespace in XML, nor the XHTML content type. This alone makes everybody reluctant to serve application/xhtml+xml. Sure, there are other complications from the XML draconianness than this, but my point is that these are all compounded, so it's hard to tell how effectively they have been put to the test. If you could run the test again with Internet Explorer's non-support taken out of the equation, then you would be able to say something about it. As it is currently, you can't know either way. So user agents need to learn how to recognize the good and the bad in both mimetypes. Otherwise you have Gresham's Law: the bad documents will drive out the good. The logical way to go might be this: If it has a preamble beginning with ^?xml or a sensible xhtml DOCTYPE declaration or a first element html xmlns=..., then handle it as xhtml unless and until it proves to be non-compliant xhtml (e.g, not well-formed xml, unquoted attributes, munged handling of xml namespaces, ...). At the point it proves to be bad xhtml reload it and treat it as regular html. Doesn't work. We need DOM and CSS treatment as in HTML, not as in XHTML, to be compatible with live content for those circumstances too. So most bogus xhtml will then be 1 or 2 seconds slower than good xhtml. Astute content providers will notice that and then do something about it. It provides a feedback mechanism for making the web become better. So, you argue that a document with an XHTML structure as text/html should change semantics in ways that will affect functionality, behaviour and presentation because of e.g. a single unescaped ampersand in a URI or a single character that breaks because of encoding? My opinion: Any feedback mechanism that directly hurts the user and only indirectly hurts the publisher, as opposed to a feedback mechanism that directly notifies the publisher, is totally backwards. Fail early. Compile time is better than run time because that's instantly obvious to the programmer - the build isn't compiling, so there there's no working but buggy build to give users. The analogy for web content is that you should fail at publishing time instead of viewing time if possible, because then you HAVE to correct your documents before you can serve them to the user. If you want to serve XML to users on the web, you should make sure your tools cannot possibly serve malformed XML, by making absolutely certain that the content has correct encoding (any defaulting must confirm that the content actually conforms to the default encoding), has a specified content type (defaulting is acceptable for fragments here, but e.g. uploading raw files should require specifying the type) and is a well formed fragment or document at publishing time, loudly rejecting any content that is malformed. (And by publishing I include all sources: design templates, content producers, information from the database, advertisements, comments, trackbacks etc.) -- David liorean Andersson
Re: [whatwg] text/html for html and xhtml (Was: Supporting MathML and SVG in text/html, and related topics)
William F Hammond wrote: The experiment begun around 2001 of punishing bad documents in application/xhtml+xml seems to have led to that mime type not being much used. That has more to do with the fact that it wasn't supported in browsers used by 90+% of users for a number of years. So user agents need to learn how to recognize the good and the bad in both mimetypes. Recognize and do what with it? Otherwise you have Gresham's Law: the bad documents will drive out the good. Perhaps you should clearly state your definitions of bad and good in this case? I'd also like to know, given those definitions, why it's bad for the bad documents to drive out the good, and how you think your proposal will prevent that from happening. If it has a preamble beginning with ^?xml or a sensible xhtml DOCTYPE declaration or a first element html xmlns=..., then handle it as xhtml unless and until it proves to be non-compliant xhtml (e.g, not well-formed xml, unquoted attributes, munged handling of xml namespaces, ...). At the point it proves to be bad xhtml reload it and treat it as regular html. What's the benefit? This seems to give the worst of both worlds, as well as a poor user experience. So most bogus xhtml will then be 1 or 2 seconds slower than good xhtml. Astute content providers will notice that and then do something about it. It provides a feedback mechanism for making the web become better. In the meantime, it punishes the users for things outside their control by degrading their user experience. It also provides a competitive advantage to UAs who ignore your proposal. Sounds like an unstable equilibrium to me, even if attainable. -Boris