date:20061105

Re: [whatwg] Footnotes, endnotes, sidenotes

2006-11-05 Thread Alexey Feldgendler

On Sat, 04 Nov 2006 19:21:42 +0600, Matthew Paul Thomas [EMAIL PROTECTED] 
wrote:

 Footnotes and endnotes are identical in content in the context of a
 print document and I am not certain how they'd differ even
 presentationally on a web page, so yes, I think those can be
 considered identical in terms of markup. 

 Scholarly books sometimes use both footnotes and endnotes for different
 things -- footnotes for citations and endnotes for tangential
 discussions, or vice versa. I've never seen an HTML document try to
 make this distinction, though.

That's because HTML documents can only have endnotes so far.


-- 
Alexey Feldgendler [EMAIL PROTECTED]
[ICQ: 115226275] http://feldgendler.livejournal.com

Re: [whatwg] getElementsByClassName() idea

2006-11-05 Thread Anne van Kesteren

On Sun, 05 Nov 2006 10:55:05 +0100, Alexey Feldgendler  
[EMAIL PROTECTED] wrote:

I think this hasn't been suggested before. Perhaps the method should
accept a DOMTokenString as argument instead of an array. This allows
things like ele.getElementsByClassName(ele.className) etc. The only
problem is how to get a DOMTokenString without first getting .className
 from somewhere. Perhaps it should be a constructor as well. 'x = new
DOMTokenString(aaa bbb)'


How is it better than DOMString?


It inherits from DOMString.  
http://www.whatwg.org/specs/web-apps/current-work/#domtokenstring defines  
it.



Hixie, the title attribute of the remove(token) definition says  
dom-tokenstring-add rather than dom-tokenstring-remove...



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/

Re: [whatwg] getElementsByClassName() idea

2006-11-05 Thread Lachlan Hunt


Lachlan Hunt wrote:

Anne van Kesteren wrote:

This allows things like ele.getElementsByClassName(ele.className) etc.


anything that accepts a DOMString will automatically accept a DOMTokenString, 
including getElementsByClassName.  So your example will already work.


It seems getElementsByClassName has been changed to accept an array, not 
a DOMString and I didn't realise.


--
Lachlan Hunt
http://lachy.id.au/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


William F Hammond wrote:


This thread is specifically about documents on the web for
_presentation_ by _browser-class_ user agents.



There is no such thing, and the sooner we realize that the better. There 
are documents on the Web, which may be processed by many different 
classes and types of agents for many different needs. These include 
classic desktop browsers, cell phone browsers, text node browsers, web 
spiders, search engines, intelligent agents, and much more.


The document format published on the Web should not prevent any of these 
uses. That means it certainly must be well-renderable in a desktop 
browser. However, we must not assume that is the only thing that will be 
done with it.


One of the major original goals of XML *and* HTML *and* SGML was to 
separate content from presentation. I am frankly shocked to see this 
basic principle being so off-handedly thrown out the window. That's why 
I'm having such a hard time believing what I'm hearing. Do people really 
want to reverse the course the Web has been on for almost 20 years? 
Especially now when diversity is increasing? In 1995 there really was 
only one browser of great significance, and almost all Web browsing was 
done in a desktop GUI. That's no longer true, and it's going to become 
less true as time passes.



Have you seen this comment from TimBL?

 http://dig.csail.mit.edu/breadcrumbs/node/166


Most certainly. My response is here:

http://cafe.elharo.com/xml/why-tim-berners-lee-is-wrong/


Have you been involved in generating XHTML+MathML content that is
presently on the web?  If so, I'd like to know where so I can have
a look.


Not heavily, but I've played with it. See, for example,

http://www.cafeconleche.org/slides/sd2005west/xmlfundamentals/42.html
http://www.cafeconleche.org/slides/sd2005west/xmlfundamentals/examples/maxwell.xml

Firefox seems to handle it. Safari can't.

--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Elliotte Harold


Lachlan Hunt wrote:


No, not without namespaces, just without the xmlns and QNames syntax.

e.g. when math is encountered in text/html, it appears in the DOM as 
math xmlns=http://www.w3.org/1998/Math/MathML;


That's like saying you want to have biology but without all that yucky 
evolution silliness.


If you don't have xmnls and xmlns:prefix then there are no namespaces, 
period. I think some people have drunk too much Infoset Kool-aid. Walter 
Perry, I knew you were right, but I didn't know how right you were.


I don't care what appears in the DOM. My model is not the DOM. Most 
models are not the DOM.


All we have is the document's text. This is what must be defined. If 
there are no namespaces in the text, then there are no namespaces. The 
DOM is a transitory model used locally. It is not the document.


We definitely don't want people thinking they can use any arbitrary 
xmlns in HTML.  That's what XHTML is for.


I'm not sure why that bothers you. As long as things are well-formed, 
what's the harm? Existing browsers seem to deal OK and in a fairly 
well-defined way with content from arbitrary namespaces. (They ignore 
it.) I've taken advantage of this for years in my own Web pages.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Henri Sivonen wrote:

Personally, I think MathML is so hopelessly verbose for hand authoring 
that this really shouldn't be about enabling hand authoring 
MathML-in-HTML5 but about enabling MathML-in-HTML5 (perhaps generated by 
a future version of itex2mml or similar) to be served through content 
management systems that are not built around a SAX pipeline or an XML 
tree API or XSLT but are built as tag soup systems and simply cannot 
guarantee well-formedness. I mean systems like WordPress and MovableType.


Please don't confuse what these systems won't do with what they can't 
do.  I personally wrote a system like this that maintained full 
well-formedness at all times despite TagSoup input. In fact, that was 
one of the easiest parts of what it did. See


http://cafe.elharo.com/web/mokka/

Today's tools and libraries make it easy for anyone using anything more 
advanced than a text editor and FTP to publish well-formed documents. 
Tools that don't do that by default should be fixed.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The utility function for semantics in HTML

2006-11-05 Thread James Graham


Elliotte Harold wrote:
I suspect there are actually two axes here, and they're not orthogonal, 


[...]

I agree we don't want to go all the way to 1 on the first axis. 


[...]


However, I would turn the second axis all the way to about 0.99


Just to note that, in your model of non-orthogonal axes, you can't 
adjust the values independently like this. Moreover, your model of the 
axes being number of semantic elements and fraction of semantic elements 
doesn't work terribly well -- consider Matthew's point about the 
existence of i and b increasing the utility of em and strong.


--
The universe doesn't care what you believe. The wonderful thing about 
science is that it doesn't ask for your faith, it just asks for your 
eyes --- http://xkcd.com/c154.html

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Henri Sivonen wrote:

A conforming HTML5 byte stream is *never* a well-formed XML 1.0 byte 
stream. 


Really? Never? There are many HTML 4 documents that are well-formed XML 
documents? Are these not legal HTML 5 documents? I scanned the spec 
quickly, but I didn't find anything that was flat out forbidden by XML. 
Is there some variant of the XML declaration or a DOCTYPE or requirement 
for an unclosed start-tag that would automatically make all HTML 5 
documents malformed XML?


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

[whatwg] Typo in 9.2.3

2006-11-05 Thread Elliotte Harold

Otherwise if the next seven chacacters are a case-insensitive match for 
the word DOCTYPE, then consume those characters and switch to the 
DOCTYPE state.


chacacters -- characters

--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] img element comments

2006-11-05 Thread Elliotte Harold


Lachlan Hunt wrote:

Using attributes to describe actual metadata about an image that has 
real practical benefits, for both the author and user, is not 
presentational in my view.


Yes, but that is not what the height and width attributes are. They say 
nothing about the image and everything about the size at which the image 
is drawn.


There's even an edge case where specifying incorrect dimensions could 
still be considered semantic.  Unfortunately, I can't find the site I'm 
thinking of, but I've seen a site somewhere that created art by using 
small images and stretching them for the pixelation effect.  In this 
case, stretching the image is part of the artwork's artistic value and 
meaning, not just it's presentation, and it would lose it all if the 
image were shown at it's actual size.




There are always edge cases. The distinction between semantics and 
presentation is a fuzzy one. Nonetheless, I think most of the time 
height and width as specified on today's img tags are clearly 
presentational.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Lachlan Hunt


Elliotte Harold wrote:

Lachlan Hunt wrote:

No, not without namespaces, just without the xmlns and QNames syntax.

e.g. when math is encountered in text/html, it appears in the DOM as 
math xmlns=http://www.w3.org/1998/Math/MathML;


That's like saying you want to have biology but without all that yucky 
evolution silliness.


If you don't have xmnls and xmlns:prefix then there are no namespaces, 
period.


In XML, that is absolutely true.  However, we are talking about 
text/html only.


I don't care what appears in the DOM. My model is not the DOM. Most 
models are not the DOM.


Does that really matter, it's the concept that matters, not the specific 
model used.  The DOM is just a convenient model to use in discussion.


All we have is the document's text. This is what must be defined. If 
there are no namespaces in the text, then there are no namespaces.


Why is the specific syntax so important?  If, in HTML (not XHTML), 
math is defined to be interpreted as the math element in the MathML 
namespace, what difference does the syntax make in the end?  All HTML 
elements are already defined to be in the XHTML namespace without any 
xmlns in the syntax, so how is that any different?


We definitely don't want people thinking they can use any arbitrary 
xmlns in HTML.  That's what XHTML is for.


I'm not sure why that bothers you. As long as things are well-formed, 
what's the harm?


text/html *does not* enforce well-formedness and *never will*.  That's 
the problem!



Existing browsers seem to deal OK and in a fairly well-defined way
with content from arbitrary namespaces. (They ignore it.) I've taken
advantage of this for years in my own Web pages.


Sure, in XML, that's true.  But in HTML, there currently are no 
namespaces (unless you count IE's disastrous XML Data Islands and Custom 
Tags, which also don't enforce well-formedness).


--
Lachlan Hunt
http://lachy.id.au/

Re: [whatwg] Footnotes, endnotes, sidenotes

2006-11-05 Thread Elliotte Harold


Matthew Paul Thomas wrote:

Scholarly books sometimes use both footnotes and endnotes for different 
things -- footnotes for citations and endnotes for tangential 
discussions, or vice versa. I've never seen an HTML document try to make 
this distinction, though.




Distinguishing footnotes and endnotes would require a multipage 
document: footnotes go at the bottom of this page, endnotes at the 
bottom of some other page.


Since HTML5 is primarily about single pages, I suggest calling any such 
element footnote and not having a separate endnote element. This is a 
good example of picking fewer over more semantics as discussed in 
another thread.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] Custom elements and attributes

2006-11-05 Thread Elliotte Harold


Øistein E. Andersen wrote:


I perfectly agree. (Actually, i think that U+7F (delete) and the C1 control 
characters
should be excluded [transformed into U+FFFD] as well, but this could perhaps be
problematic due to spurious CP1252 characters.)


Spurious Cp1252 is a real problem. In fact, incorrectly labeled encoding 
is a real problem, and a thorny one. Draconian error handling in XML 
solves this, but I'm not sure what HTML 5 should do here. It's worth 
thinking about though. It's also worth reviewing the work the W3C TAG 
and I18N working groups did on this issue since a lot of smart people 
did a lot of thinking about this quite recently:


http://www.w3.org/2001/tag/doc/mime-respect-20060412
http://www.w3.org/TR/charmod/

I don't remember the exact outcome myself, except that it's a really 
ugly problem that truly requires some changes in what options webmasters 
give to web content creators.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Lachlan Hunt


Elliotte Harold wrote:

Henri Sivonen wrote:

A conforming HTML5 byte stream is *never* a well-formed XML 1.0 byte 
stream. 


Really? Never?


Yes, never!  For one, a conforming HTML 5 (not XHTML 5) document 
requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML.


There are many HTML 4 documents that are well-formed XML documents? 
Are these not legal HTML 5 documents?


No.  That's like asking is a valid HTML 3.2 document a conforming HTML 
4.01 document!?  No, it's not!


--
Lachlan Hunt
http://lachy.id.au/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Elliotte Harold


Lachlan Hunt wrote:

Why is the specific syntax so important?  If, in HTML (not XHTML), 
math is defined to be interpreted as the math element in the MathML 
namespace, what difference does the syntax make in the end?  All HTML 
elements are already defined to be in the XHTML namespace without any 
xmlns in the syntax, so how is that any different?


The specific syntax is important because there's a huge, useful 
toolchain for processing XML and there's essentially zilch for 
processing this strange HTML 5 thing.


If there ever is any software to process it, I expect it will just be an 
adapter that feeds the HTML 5 into the XML tools. Why not ditch the HTML 
5 layer completely and simply allow the XML tools direct access?


Remember, even HTML 4 is too complex for a lot of authors. More and more 
 publishers are using CMSs and Wikis and markdown and Dreamweaver and 
similar tools. Dinosaur techies like me still editing this junk by hand 
can handle namespace prefixes, empty-element tags, and even MathML. 
(Well, maybe not raw MathML but I try.) Who, exactly is this HTML 
serialization supposed to help?


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Elliotte Harold


Lachlan Hunt wrote:

I'm not sure why that bothers you. As long as things are well-formed, 
what's the harm?


text/html *does not* enforce well-formedness and *never will*.  That's 
the problem!


application/xhtml+xml doesn't enforce well-formedness either. That's the 
job of the parser.


If a server sends text/html, and the resulting character stream parses 
as well-formed, then it is well-formed. If it doesn't, it isn't. What 
matters more is what the stream is, not what the server says it is. 
(Yes, I know there are a few details about character encoding detection 
that come into play here. That doesn't change the point, though.)


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Anne van Kesteren

On Sun, 05 Nov 2006 13:12:44 +0100, Elliotte Harold  
[EMAIL PROTECTED] wrote:
I don't care what appears in the DOM. My model is not the DOM. Most  
models are not the DOM.


Sorry, but the model of the web is the DOM, whether you like it or not.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/

Re: [whatwg] Custom elements and attributes

2006-11-05 Thread Lachlan Hunt


Elliotte Harold wrote:

Spurious Cp1252 is a real problem. I'm not sure what HTML 5 should do here.


At the very least, ISO-8859-1 must be treated as Windows-1252.  I'm not 
sure about the other ISO-8859 encodings.  Numeric and hex character 
references from 128 to 159 must also be treated as Windows-1252 code points.


--
Lachlan Hunt
http://lachy.id.au/

Re: [whatwg] Custom elements and attributes

2006-11-05 Thread Anne van Kesteren

On Sun, 05 Nov 2006 13:50:04 +0100, Elliotte Harold  
[EMAIL PROTECTED] wrote:

[...]

Spurious Cp1252 is a real problem. In fact, incorrectly labeled encoding  
is a real problem, and a thorny one. Draconian error handling in XML  
solves this, but I'm not sure what HTML 5 should do here. It's worth  
thinking about though. It's also worth reviewing the work the W3C TAG  
and I18N working groups did on this issue since a lot of smart people  
did a lot of thinking about this quite recently:


http://www.w3.org/2001/tag/doc/mime-respect-20060412
http://www.w3.org/TR/charmod/

I don't remember the exact outcome myself, except that it's a really  
ugly problem that truly requires some changes in what options webmasters  
give to web content creators.


How does requiring changes solve the problem for content that's out there?  
This doesn't make much sense to me.



--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/

[whatwg] Spelling error: labelled -- labeled

2006-11-05 Thread Elliotte Harold

Multiple times in the draft labelled should be change to labeled 
(unless maybe this is a British spelling?) I always get this one wrong 
myself, and wouldn't have noticed if the spell checker in Thunderbird 
hadn't complained.


http://dictionary.reference.com/browse/labeled

--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Lachlan Hunt wrote:

Yes, never!  For one, a conforming HTML 5 (not XHTML 5) document 
requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML.




OK. I thought it might be something like that. I just couldn't find it 
in skimming the spec.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Anne van Kesteren wrote:


Sorry, but the model of the web is the DOM, whether you like it or not.


*The* model of the Web is not DOM. *The* model of the Web does not 
exist. There are multiple instances of *a* model of the Web. Several of 
these call themselves DOM, though I'm sure this group knows quite well 
just how  different they really are.


The map is not the world. The model is not the document. Any model is 
just a convenient local representation. The reality of the document is 
its text.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] Custom elements and attributes

2006-11-05 Thread Elliotte Harold


Anne van Kesteren wrote:

How does requiring changes solve the problem for content that's out 
there? This doesn't make much sense to me.




The specific problem is that an author may publish a correctly labeled 
UTF-8 or ISO-8859-8 document or some such. However the server sends a 
Content-type header that requires the parser to treat the document as 
ISO-8859-1 or US-ASCII or something else.


The need is for server administrators to allow content authors to 
specify content types and character sets for the documents they write. 
The content doesn't need to change. The authors just need the ability to 
 specify the server headers for their documents.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] Spelling error: labelled -- labeled

2006-11-05 Thread Lachlan Hunt


Elliotte Harold wrote:
Multiple times in the draft labelled should be change to labeled 
(unless maybe this is a British spelling?)


labeled is the en-US spelling, labelled is the correct spelling in 
en-GB, en-AU and many other countries.



wouldn't have noticed if the spell checker in Thunderbird hadn't complained.


If you change to the en-AU or en-GB dictionary, it will complain about 
the other.


--
Lachlan Hunt
http://lachy.id.au/

Re: [whatwg] getElementsByClassName() idea

2006-11-05 Thread Alexey Feldgendler

On Sun, 05 Nov 2006 16:18:32 +0600, Anne van Kesteren [EMAIL PROTECTED] wrote:

 I think this hasn't been suggested before. Perhaps the method should
 accept a DOMTokenString as argument instead of an array. This allows
 things like ele.getElementsByClassName(ele.className) etc. The only
 problem is how to get a DOMTokenString without first getting .className
  from somewhere. Perhaps it should be a constructor as well. 'x = new
 DOMTokenString(aaa bbb)'

 How is it better than DOMString?

 It inherits from DOMString.
 http://www.whatwg.org/specs/web-apps/current-work/#domtokenstring defines
 it.

I still don't get it what's the advantage of having getElementsByClassName take 
a DOMTokenString argument over a plain DOMString.


-- 
Alexey Feldgendler [EMAIL PROTECTED]
[ICQ: 115226275] http://feldgendler.livejournal.com

Re: [whatwg] getElementsByClassName() idea

2006-11-05 Thread Anne van Kesteren

On Sun, 05 Nov 2006 14:27:14 +0100, Alexey Feldgendler  
[EMAIL PROTECTED] wrote:
I still don't get it what's the advantage of having  
getElementsByClassName take a DOMTokenString argument over a plain  
DOMString.


Oh right, sorry. Yeah, I suppose a DOMString makes more sense.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Henri Sivonen


On Nov 5, 2006, at 15:12, Elliotte Harold wrote:


Anne van Kesteren wrote:

Sorry, but the model of the web is the DOM, whether you like it or  
not.


*The* model of the Web is not DOM.


The model in the browsers that matter is the DOM. Unfortunately. But  
it is too late to change it. And having even that level of interop is  
great.


The map is not the world. The model is not the document. Any model  
is just a convenient local representation. The reality of the  
document is its text.


Actually, the reality of an (X)HTML5 document is in abstract syntax  
tree parsed out of either serialization.


In browsers that allow scripting, the tree has to be exposed via the  
DOM API and has to implement the DOM notion of the tree data model.  
But if you are writing a non-browser app that doesn't do scripting,  
you could use an HTML5 parser that emits SAX2 events and you could  
construct a XOM tree out of those. You don't need to care whether the  
SAX2 events came from an HTML5 parser or from an XML parser.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Bjoern Hoehrmann

* Lachlan Hunt wrote:
Yes, never!  For one, a conforming HTML 5 (not XHTML 5) document 
requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML.

Yes it is.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Henri Sivonen wrote:

The model in the browsers that matter is the DOM. Unfortunately. But it 
is too late to change it. And having even that level of interop is great.



But as I keep saying, *it's not just browsers*. There's a lot more 
happening on the Web than classic desktop browsers. A document posted to 
the Web is available to all sorts of clients. Some of the most 
interesting things happen when there's no human in the loop to look at a 
browser.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Bjoern Hoehrmann wrote:

* Lachlan Hunt wrote:
Yes, never!  For one, a conforming HTML 5 (not XHTML 5) document 
requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML.


Yes it is.



Good catch. I forgot that. There are one or two XML parsers that blow 
this one, but they're not much used. The specific BNF production is:


doctypedecl  ::= '!DOCTYPE' S  Name (S  ExternalID)? S? ('[' intSubset 
']' S?)? ''


External ID and system ID are both optional.

Is there anything else that stops every HTML5 document from being a 
well-formed XML document?


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Lachlan Hunt


Elliotte Harold wrote:

Lachlan Hunt wrote:

Why is the specific syntax so important?


The specific syntax is important because there's a huge, useful 
toolchain for processing XML and there's essentially zilch for 
processing this strange HTML 5 thing.


In the real world, you cannot expect to be be able to process content 
served as text/html using XML tools.  That's like trying to compile Java 
with a C++ compiler.  Despite the similarities in syntax, they are 
different languages that require different parsers.


Why not ditch the HTML 5 layer completely and simply allow the XML tools 
direct access?


Because we have to remain compatible with the web, where there are an 
infinite number of existing documents that browsers must be able to 
handle interoperably.



Who, exactly is this HTML serialization supposed to help?


Anyone for whom interoperability in processing real world content is 
important.  This includes, among others:


* Browser vendors that have to deal with real world content.
* CMS, editor, and other tool vendors that have to accept HTML input 
from users.

* Authors that have to develop for the real world.
* Users who like to surf the web in any browser they choose.

--
Lachlan Hunt
http://lachy.id.au/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Bjoern Hoehrmann

* Elliotte Harold wrote:
Really? Never? There are many HTML 4 documents that are well-formed XML 
documents? Are these not legal HTML 5 documents? I scanned the spec 
quickly, but I didn't find anything that was flat out forbidden by XML. 

For a document to be a HTML 4 document it would need a HTML 4.01
document type declaration, and for a document to be a well-formed
XML document, it would have to have no HTML 4.01 document type
declaration. There are not many documents that both have and don't
have a HTML 4.01 document type declaration. Remember that the DTD
needs to match the production for external subsets.
-- 
Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de
Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de
68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Henri Sivonen


On Nov 5, 2006, at 15:54, Elliotte Harold wrote:


Henri Sivonen wrote:

The model in the browsers that matter is the DOM. Unfortunately.  
But it is too late to change it. And having even that level of  
interop is great.


But as I keep saying, *it's not just browsers*. There's a lot more  
happening on the Web than classic desktop browsers. A document  
posted to the Web is available to all sorts of clients. Some of the  
most interesting things happen when there's no human in the loop to  
look at a browser.


If your app does not run scripts from the Web, you are exempt from  
using the DOM as the model.


Quoting the spec:

User agents with no scripting support

Implementations that do not support scripting (or which have
their scripting features disabled) are exempt from supporting
the events and DOM interfaces mentioned in this specification.
For the parts of this specification that are defined in terms
of an events model or in terms of the DOM, such user agents
must still act as if events and the DOM were supported.


I predict that in practice many non-browser apps will be in violation  
of the last sentence, because their authors will see more value in  
being able to use a streaming API without buffering or in being able  
to decouple the parser from the tree builder than in having  
interoperable error recovery.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Lachlan Hunt


Bjoern Hoehrmann wrote:

* Lachlan Hunt wrote:
Yes, never!  For one, a conforming HTML 5 (not XHTML 5) document 
requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML.


Yes it is.


Oh, sorry.  You're right.  I thought it required a PUBLIC or SYSTEM 
identifier.  I got confused because, unlike SGML, you can't have a 
public identifier without a system identifer.


--
Lachlan Hunt
http://lachy.id.au/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Elliotte Harold


Lachlan Hunt wrote:

Why not ditch the HTML 5 layer completely and simply allow the XML 
tools direct access?


Because we have to remain compatible with the web, where there are an 
infinite number of existing documents that browsers must be able to 
handle interoperably.


You're getting this backwards. There's no reason for HTML 5 to be 
compatible with existing *documents*, existing browsers and tools sure; 
but other documents can be handled on their own.



Who, exactly is this HTML serialization supposed to help?


Anyone for whom interoperability in processing real world content is 
important.  This includes, among others:


* Browser vendors that have to deal with real world content.


Browser vendors can handle XHTML now. It's a non-issue for them.

* CMS, editor, and other tool vendors that have to accept HTML input 
from users.


They mostly don't use HTML now. Instead they use things like markdown. 
If they do accept HTML, they tidy it up in various ways for security 
reasons, if not for well-formedness. Adding well-formedness checking to 
those that don't is quite simple. It's a question of will and desire, 
not ability.



* Authors that have to develop for the real world.


I have no idea what you mean by this.  I suspect it's redundant.


* Users who like to surf the web in any browser they choose.


As long as any browser they choose is some browser released in this 
millennium, XHTML is fine.


I'm sorry. The use cases so far just don't hold water. I reput the 
question: who does HTML serialization help? What problems does this solve?


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Anne van Kesteren wrote:

Well, the problem is that they would mean different things. Consider the 
following fragment:


Meaning is in the eye of the beholder. In point of the fact, there are a 
lot more than two different things the fragment you propose might mean. 
Meaning is determined locally by each recipient for its own unique 
purposes, which may or may not be anything close to what the document 
producer expects.


The idea that the server can somehow impose its interpretation of the 
content on the recipient is an illusion. It's never been true, and it's 
never going to be true, no matter how any specs you write.


The syntax matters. If you give me the right well-formed syntax, I can 
do what I need to do with it, as can others. If you give me malformed 
syntax, working with the document gets a lot more complicated. My 
concern is not from browser vendors on agreeing on one interpretation 
that's somehow useful to them. It's making sure that they don't in the 
process break everything else anyone else might want to do with these 
documents.


Where's Walter Perry when you need him? :-)

--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Anne van Kesteren

On Sun, 05 Nov 2006 15:29:29 +0100, Elliotte Harold  
[EMAIL PROTECTED] wrote:

 * Browser vendors that have to deal with real world content.


Browser vendors can handle XHTML now. It's a non-issue for them.


Working for one I can assure you it's very much an issue.


--
Anne van Kesteren
http://annevankesteren.nl/
http://www.opera.com/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Henri Sivonen


On Nov 5, 2006, at 14:30, Elliotte Harold wrote:


Henri Sivonen wrote:

Personally, I think MathML is so hopelessly verbose for hand  
authoring that this really shouldn't be about enabling hand  
authoring MathML-in-HTML5 but about enabling MathML-in-HTML5  
(perhaps generated by a future version of itex2mml or similar) to  
be served through content management systems that are not built  
around a SAX pipeline or an XML tree API or XSLT but are built as  
tag soup systems and simply cannot guarantee well-formedness. I  
mean systems like WordPress and MovableType.


Please don't confuse what these systems won't do with what they  
can't do.  I personally wrote a system like this that maintained  
full well-formedness at all times despite TagSoup input. In fact,  
that was one of the easiest parts of what it did. See


http://cafe.elharo.com/web/mokka/


http://hsivonen.iki.fi/validator/?doc=http%3A%2F%2Fcafe.elharo.com% 
2Fweb%2Fmokka%2Fparser=xmllaxtype=yes


I think that makes my point for me.

Today's tools and libraries make it easy for anyone using anything  
more advanced than a text editor and FTP to publish well-formed  
documents. Tools that don't do that by default should be fixed.


In order for a non-trivial tool to produce well-formed output  
consistently and reliably, the architecture of the tool needs to be  
designed with this goal in mind. It is remarkably difficult to make  
e.g. WordPress or MovableType emit stuff that is for sure suitable  
for serving as application/xhtml+xml.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Henri Sivonen


On Nov 5, 2006, at 16:39, Elliotte Harold wrote:


Henri Sivonen wrote:

Is there anything else that stops every HTML5 document from being  
a well-formed XML document?

Case-insensitivity and empty elements for example.


These would stop some documents from being well-formed, not all.  
I'm sure you're allowed to use all lower case. can you use empty- 
element tags if you wish? Or must it be br and not br / or  
br/br?


It must be br to be conforming.

You are still stuck with the syntactic similarity of XML and HTML5.  
You wouldn't use an XML parser to parse RELAX NG Compact Syntax,  
would you?


Also, even if a subset of HTML5 documents happened to be parseable as  
XML, it doesn't help unless the authors whose documents you consume  
happen to only produce that subset. If your app insists on using an  
XML parser for text/html content, it isn't very useful for processing  
the stuff found on the Web.


But in any case, even if an HTML5 byte stream happens to be  
parseable as XML 1.0, you get the wrong infoset if you use an XML  
parser instead of an HTML5 parser.


Walter, we need you!

There is no right infoset. There is no wrong infoset.


Given this HTML document:
!DOCTYPE htmlHTMLtitleFoo/Titlepbar/html
a parser should convey to the application a tree that has the  
following features:
 * There is a root element node with the local name html in the  
http://www.w3.org/1999/xhtml; namespace.

 * The root element node has two child nodes.
 * The root element node has an element node with the local name  
head in the http://www.w3.org/1999/xhtml; namespace as its first  
child.
 * The root element node has an element node with the local name  
body in the http://www.w3.org/1999/xhtml; namespace as its last  
child.
 * The first child of the root element has a single child node,  
which is an element node with the local name title in the http:// 
www.w3.org/1999/xhtml namespace.
 * The first child of the root element has a single child node,  
which is an element node with the local name p in the http:// 
www.w3.org/1999/xhtml namespace.
 * The element with the local name title in the http://www.w3.org/ 
1999/xhtml namespace has a single child node, which is a text node  
with the value Foo.
 * The element with the local name p in the http://www.w3.org/ 
1999/xhtml namespace has a single child node, which is a text node  
with the value bar.


If your parser reports something else, it is not suitable for parsing  
HTML5 and is *wrong* per spec.



the infoset I derive from the document is my concern, not yours.


You want certain stuff to be in a particular namespace. From this  
thread, it seems that you want to make it my problem to produce  
particular namespace declaration syntax--instead of making it your  
concern to use an HTML5 parser.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Henri Sivonen


On Nov 5, 2006, at 16:35, Elliotte Harold wrote:


Anne van Kesteren wrote:

Well, the problem is that they would mean different things.  
Consider the following fragment:


Meaning is in the eye of the beholder. In point of the fact, there  
are a lot more than two different things the fragment you propose  
might mean. Meaning is determined locally by each recipient for its  
own unique purposes, which may or may not be anything close to what  
the document producer expects.


Well, the whole point of having a spec is to give a mutual  
understanding to the producer and consumer of what the bytes mean.


The syntax matters. If you give me the right well-formed syntax, I  
can do what I need to do with it, as can others. If you give me  
malformed syntax, working with the document gets a lot more  
complicated. My concern is not from browser vendors on agreeing on  
one interpretation that's somehow useful to them. It's making sure  
that they don't in the process break everything else anyone else  
might want to do with these documents.


Everyone who wishes to process HTML5 with XML tools is going to need  
an HTML5 parser that exposes an interface that makes the HTML5 parser  
look like an XML parser to the rest of the tool chain. The part of  
the XML tool chain that you can't use with HTML5 is the XML parser  
itself.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Henri Sivonen


On Nov 5, 2006, at 16:29, Elliotte Harold wrote:


Lachlan Hunt wrote:

Why not ditch the HTML 5 layer completely and simply allow the  
XML tools direct access?
Because we have to remain compatible with the web, where there are  
an infinite number of existing documents that browsers must be  
able to handle interoperably.


You're getting this backwards. There's no reason for HTML 5 to be  
compatible with existing *documents*, existing browsers and tools  
sure; but other documents can be handled on their own.


The HTML5 parsing algorithm is not about adding a third parser  
alongside old HTML and XML. It is about defining a parsing algorithm  
for text/html content in general--including content that purports to  
conform to older HTML specs.



Browser vendors can handle XHTML now. It's a non-issue for them.


Having worked recently on improving XHTML handling in Gecko, I assure  
you that XHTML is not a non-issue from the browser point of view.


* CMS, editor, and other tool vendors that have to accept HTML  
input from users.


They mostly don't use HTML now. Instead they use things like  
markdown. If they do accept HTML, they tidy it up in various ways  
for security reasons, if not for well-formedness. Adding well- 
formedness checking to those that don't is quite simple. It's a  
question of will and desire, not ability.


You are underestimating the bozo factor on ability and the effect of  
legacy code even when there is desire.


--
Henri Sivonen
[EMAIL PROTECTED]
http://hsivonen.iki.fi/

Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)

2006-11-05 Thread Elliotte Harold


Henri Sivonen wrote:

If my parser tells your ContentHandler that there is a namespace, then 
there is. Your ContentHandler doesn't get to see any source bytes.


But you don't send me a parser and you don't talk to my ContentHandler. 
Your HTTP server sends me a stream of bytes that my parser then 
processes. All I get from you is bytes. This is what we exchange; not 
infosets, not DOM trees, not trees of any sort.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] The problems with namespaces in text/html

2006-11-05 Thread Elliotte Harold


Henri Sivonen wrote:


http://cafe.elharo.com/web/mokka/


http://hsivonen.iki.fi/validator/?doc=http%3A%2F%2Fcafe.elharo.com%2Fweb%2Fmokka%2Fparser=xmllaxtype=yes 



I think that makes my point for me.


Not really. That's not the system I was talking about. The article at 
the URL I referenced describes the old system as well the reasons I 
switched to WordPress, but that had nothing to do with well-formedness. 
My point is that well-formedness is not hard to maintain if the server 
vendor cares to do so. The problem is that many server vendors don't 
understand why they should make even a small effort in this direction. 
Catering to their problems does not strike me as a wise course for the Web.


The URL timed out when I tried to use your validator so I'm not sure 
what I saw. xmllint did pick up one problem with an unescaped ampersand 
in some third party code that draconian error handling would have 
noticed weeks ago. I'll have to bug that provider to fix that.


--
Elliotte Rusty Harold  [EMAIL PROTECTED]
Java I/O 2nd Edition Just Published!
http://www.cafeaulait.org/books/javaio2/
http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/

Re: [whatwg] colspan=0

2006-11-05 Thread Matthew Paul Thomas


On Nov 4, 2006, at 1:53 AM, Henri Sivonen wrote:


None of Opera 9.02, Firefox 2.0, IE7 and Safari 2.0.4 implement 
colspan=0 as specified in HTML 4.01. Trident, Presto and WebKit at 
least agree on what to do with it: they treat it like colspan=1.


I suggest that only positive integers be conforming and that 
non-conforming values be treated as 1.

...


I know browser vendors have had a long time to implement this, but 
still, I think giving up on it would be a shame. The number of rows or 
columns in a table is often rather expensive to calculate ahead of 
time. As long as this has to be done to calculate the rowspan= or 
colspan= of header cells, this can substantially increase the time an 
application takes to generate a table. For the browser to interpret 
colspan=0 or rowspan=0 instead would both make life easier for 
application authors, and make such pages faster overall.


--
Matthew Paul Thomas
http://mpt.net.nz/

44 matches

Mail list logo