Re: [whatwg] text/html for html and xhtml

2008-04-28 Thread Křištof Želechovski
If the server infers the MIME type from content and sends it over HTTP as it
should, you can have both.
Chris

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Boris Zbarsky
Sent: Saturday, April 19, 2008 6:10 AM
To: William F Hammond
Cc: [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Subject: Re: [whatwg] text/html for html and xhtml

William F Hammond wrote:
 Or, if that is too hard or too politically difficult,
 going forward the WG should provide a formula for the front of a
 document that asks for an xhtml parse.

What is the benefit over using a MIME type as now, though?

-Boris




Re: [whatwg] text/html for html and xhtml

2008-04-28 Thread Boris Zbarsky

Křištof Želechovski wrote:

If the server infers the MIME type from content and sends it over HTTP as it
should, you can have both.


Changing servers (including getting existing installs updated) is even more 
painful than changing browsers, though.


It would be very nice if servers had better MIME type handling, but the reality 
is that they don't, and likely won't any time in the next several (5+, I would 
guess) years.


I'd love to be proved a hopeless pessimist on this point, of course.  ;)

-Boris






Re: [whatwg] text/html for html and xhtml

2008-04-20 Thread Boris Zbarsky

William F Hammond wrote:

1.  Many search engines appear not to look at application/xhtml+xml.


That seems like a much simpler thing to fix in search engines than in 
the specification and UAs, to be honest.  I don't see this as a 
compelling reason to add complexity to the parsing model.



2.  Many content providers have reported that they are stranded,
i.e., their contractors who receive the content by upload for
subsequent placement under the eye of an http server do not
support application/xhtml+xml.


This is the argument for any type of content-type sniffing, no?  By this 
argument, why bother with MIME types at all?



(And, of course, text/xml and application/xml are non-specific
mimetypes for which there is no base namespace.  They are sane content
channels for web browsers only when display is entirely controlled
with something like CSS.)


Uh...  Have you tested this?  As I recall there are no major 
layout/rendering differences in Gecko, Opera, and Safari between an 
XHTML document sent as application/xhtml+xml and one sent as 
application/xml.  In both cases, it needs to have the XHTML namespace on 
all the nodes to be handled correctly.  There are differences in terms 
of the DOM: the document doesn't necessarily implement the HTMLDocument 
interface.  HTML5 proposes to change that so that all Documents 
implement HTMLDocument if the UA supports HTMLDocument at all.  At that 
point it really won't matter whether XHTML is served as 
application/xhtml+xml from a DOM point of view.  There might be a new 
behavior difference introduced if the body background special-casing 
in CSS is extended to apply to application/xhtml+xml like it applies to 
text/html now.  But I'd hardly call application/xml delivery of XHTML 
insane now, and even less so after the HTMLDocument change is made.


If you're talking about UAs other than those three that support 
application/xhtml+xml, I'll admit to not knowing what the situation is 
with those.


-Boris


-Boris


Re: [whatwg] text/html for html and xhtml

2008-04-18 Thread Boris Zbarsky

William F Hammond wrote:

Perhaps you should clearly state your definitions of bad and good
in this case?  I'd also like to know, given those definitions, why
it's bad for the bad documents to drive out the good, and how you
think your proposal will prevent that from happening.


Good and bad here apply to document instances.  Good means
compliant xhtml+(mathml|svg)*; bad, as I casually used it, means
other.


OK.


My only point is that a user agent should parse as xml a
document whose preamble indicates xhtml even when the mimetype is
text/html. 


That would break a large fraction of popular websites out there.  In 
addition detecting the preamble requires assumption of a parsing 
model.  I'm pretty sure one can construct documents that have different 
preambles when treated as HTML and XML.



Or, if that is too hard or too politically difficult,
going forward the WG should provide a formula for the front of a
document that asks for an xhtml parse.


What is the benefit over using a MIME type as now, though?

-Boris



Re: [whatwg] text/html for html and xhtml (Was: Supporting MathML and SVG in text/html, and related topics)

2008-04-17 Thread liorean
On 17/04/2008, William F Hammond [EMAIL PROTECTED] wrote:
  Previously:

   Yes, but the point is, once a user agent begins to sniff, there's no
   rational excuse for it not to recognize compliant xhtml+(mathml|svg).

Yes there is. Live content rely on even perfectly well formed XHTML to
have the HTML behaviours of CSS and the DOM. It also relies on all
elements having #PCDATA content. Thus scripts and style sheets would
be given an incompatible parsing that changes the meaning of '', ''
and XML comments within scripts, just to take one example. That is, a
script which is well formed and valid XML and which is XML well
formedness-compatible and proper HTML may have entirely textual
content. (The subset of live XHTML content that uses embedded scripts
which are also XML well formed without using explicit CDATA wrapping
is very small, though.)

What obstacles to this exist?
   
The Web.

   Really!?!

Really.

  And then:

   The Web.
  
   Really!?!
  
   Yes, see for instance:
  
  http://lists.w3.org/Archives/Public/public-html/2007Aug/1248.html

  Taylor's comment is mainly about what happens when a user agent
  confuses tag soup with good xhtml.

  It is a different question how a user agent decides what it is looking
  at.

  Whether there is one mimetype or two, erroneous content will need
  handling.  The experiment begun around 2001 of punishing bad
  documents in application/xhtml+xml seems to have led to that mime type
  not being much used.

We don't know how big a factor the draconianness of XML parsing really
is. The fact is, the single biggest consumer of those documents has
not begun supporting XHTML yet. Internet Explorer supports HTML and
XML but not the XHTML namespace in XML, nor the XHTML content type.
This alone makes everybody reluctant to serve application/xhtml+xml.
Sure, there are other complications from the XML draconianness than
this, but my point is that these are all compounded, so it's hard to
tell how effectively they have been put to the test. If you could run
the test again with Internet Explorer's non-support taken out of the
equation, then you would be able to say something about it. As it is
currently, you can't know either way.

  So user agents need to learn how to recognize the good and the bad
  in both mimetypes.

  Otherwise you have Gresham's Law: the bad documents will drive out the
  good.

  The logical way to go might be this:

  If it has a preamble beginning with ^?xml  or a sensible
  xhtml DOCTYPE declaration or a first element html xmlns=...,
  then handle it as xhtml unless and until it proves to be non-compliant
  xhtml (e.g, not well-formed xml, unquoted attributes, munged handling
  of xml namespaces, ...).  At the point it proves to be bad xhtml reload
  it and treat it as regular html.

Doesn't work. We need DOM and CSS treatment as in HTML, not as in
XHTML, to be compatible with live content for those circumstances too.

  So most bogus xhtml will then be 1 or 2 seconds slower than good xhtml.
  Astute content providers will notice that and then do something about it.
  It provides a feedback mechanism for making the web become better.

So, you argue that a document with an XHTML structure as text/html
should change semantics in ways that will affect functionality,
behaviour and presentation because of e.g. a single unescaped
ampersand in a URI or a single character that breaks because of
encoding?




My opinion:
Any feedback mechanism that directly hurts the user and only
indirectly hurts the publisher, as opposed to a feedback mechanism
that directly notifies the publisher, is totally backwards. Fail
early. Compile time is better than run time because that's instantly
obvious to the programmer - the build isn't compiling, so there
there's no working but buggy build to give users. The analogy for web
content is that you should fail at publishing time instead of viewing
time if possible, because then you HAVE to correct your documents
before you can serve them to the user.

If you want to serve XML to users on the web, you should make sure
your tools cannot possibly serve malformed XML, by making absolutely
certain that the content has correct encoding (any defaulting must
confirm that the content actually conforms to the default encoding),
has a specified content type (defaulting is acceptable for fragments
here, but e.g. uploading raw files should require specifying the type)
and is a well formed fragment or document at publishing time, loudly
rejecting any content that is malformed.   (And by publishing I
include all sources: design templates, content producers, information
from the database, advertisements, comments, trackbacks etc.)
-- 
David liorean Andersson


Re: [whatwg] text/html for html and xhtml (Was: Supporting MathML and SVG in text/html, and related topics)

2008-04-16 Thread Boris Zbarsky

William F Hammond wrote:

The experiment begun around 2001 of punishing bad
documents in application/xhtml+xml seems to have led to that mime type
not being much used.


That has more to do with the fact that it wasn't supported in browsers 
used by 90+% of users for a number of years.



So user agents need to learn how to recognize the good and the bad
in both mimetypes.


Recognize and do what with it?


Otherwise you have Gresham's Law: the bad documents will drive out the
good.


Perhaps you should clearly state your definitions of bad and good in 
this case?  I'd also like to know, given those definitions, why it's bad 
for the bad documents to drive out the good, and how you think your 
proposal will prevent that from happening.



If it has a preamble beginning with ^?xml  or a sensible
xhtml DOCTYPE declaration or a first element html xmlns=...,
then handle it as xhtml unless and until it proves to be non-compliant
xhtml (e.g, not well-formed xml, unquoted attributes, munged handling
of xml namespaces, ...).  At the point it proves to be bad xhtml reload
it and treat it as regular html.


What's the benefit?  This seems to give the worst of both worlds, as 
well as a poor user experience.



So most bogus xhtml will then be 1 or 2 seconds slower than good xhtml.
Astute content providers will notice that and then do something about it.
It provides a feedback mechanism for making the web become better.


In the meantime, it punishes the users for things outside their control 
by degrading their user experience.  It also provides a competitive 
advantage to UAs who ignore your proposal.


Sounds like an unstable equilibrium to me, even if attainable.

-Boris