Re: [whatwg] Footnotes, endnotes, sidenotes
On Sat, 04 Nov 2006 19:21:42 +0600, Matthew Paul Thomas [EMAIL PROTECTED] wrote: Footnotes and endnotes are identical in content in the context of a print document and I am not certain how they'd differ even presentationally on a web page, so yes, I think those can be considered identical in terms of markup. Scholarly books sometimes use both footnotes and endnotes for different things -- footnotes for citations and endnotes for tangential discussions, or vice versa. I've never seen an HTML document try to make this distinction, though. That's because HTML documents can only have endnotes so far. -- Alexey Feldgendler [EMAIL PROTECTED] [ICQ: 115226275] http://feldgendler.livejournal.com
Re: [whatwg] getElementsByClassName() idea
On Sun, 05 Nov 2006 10:55:05 +0100, Alexey Feldgendler [EMAIL PROTECTED] wrote: I think this hasn't been suggested before. Perhaps the method should accept a DOMTokenString as argument instead of an array. This allows things like ele.getElementsByClassName(ele.className) etc. The only problem is how to get a DOMTokenString without first getting .className from somewhere. Perhaps it should be a constructor as well. 'x = new DOMTokenString(aaa bbb)' How is it better than DOMString? It inherits from DOMString. http://www.whatwg.org/specs/web-apps/current-work/#domtokenstring defines it. Hixie, the title attribute of the remove(token) definition says dom-tokenstring-add rather than dom-tokenstring-remove... -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] getElementsByClassName() idea
Lachlan Hunt wrote: Anne van Kesteren wrote: This allows things like ele.getElementsByClassName(ele.className) etc. anything that accepts a DOMString will automatically accept a DOMTokenString, including getElementsByClassName. So your example will already work. It seems getElementsByClassName has been changed to accept an array, not a DOMString and I didn't realise. -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] The problems with namespaces in text/html
William F Hammond wrote: This thread is specifically about documents on the web for _presentation_ by _browser-class_ user agents. There is no such thing, and the sooner we realize that the better. There are documents on the Web, which may be processed by many different classes and types of agents for many different needs. These include classic desktop browsers, cell phone browsers, text node browsers, web spiders, search engines, intelligent agents, and much more. The document format published on the Web should not prevent any of these uses. That means it certainly must be well-renderable in a desktop browser. However, we must not assume that is the only thing that will be done with it. One of the major original goals of XML *and* HTML *and* SGML was to separate content from presentation. I am frankly shocked to see this basic principle being so off-handedly thrown out the window. That's why I'm having such a hard time believing what I'm hearing. Do people really want to reverse the course the Web has been on for almost 20 years? Especially now when diversity is increasing? In 1995 there really was only one browser of great significance, and almost all Web browsing was done in a desktop GUI. That's no longer true, and it's going to become less true as time passes. Have you seen this comment from TimBL? http://dig.csail.mit.edu/breadcrumbs/node/166 Most certainly. My response is here: http://cafe.elharo.com/xml/why-tim-berners-lee-is-wrong/ Have you been involved in generating XHTML+MathML content that is presently on the web? If so, I'd like to know where so I can have a look. Not heavily, but I've played with it. See, for example, http://www.cafeconleche.org/slides/sd2005west/xmlfundamentals/42.html http://www.cafeconleche.org/slides/sd2005west/xmlfundamentals/examples/maxwell.xml Firefox seems to handle it. Safari can't. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
Lachlan Hunt wrote: No, not without namespaces, just without the xmlns and QNames syntax. e.g. when math is encountered in text/html, it appears in the DOM as math xmlns=http://www.w3.org/1998/Math/MathML; That's like saying you want to have biology but without all that yucky evolution silliness. If you don't have xmnls and xmlns:prefix then there are no namespaces, period. I think some people have drunk too much Infoset Kool-aid. Walter Perry, I knew you were right, but I didn't know how right you were. I don't care what appears in the DOM. My model is not the DOM. Most models are not the DOM. All we have is the document's text. This is what must be defined. If there are no namespaces in the text, then there are no namespaces. The DOM is a transitory model used locally. It is not the document. We definitely don't want people thinking they can use any arbitrary xmlns in HTML. That's what XHTML is for. I'm not sure why that bothers you. As long as things are well-formed, what's the harm? Existing browsers seem to deal OK and in a fairly well-defined way with content from arbitrary namespaces. (They ignore it.) I've taken advantage of this for years in my own Web pages. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
Henri Sivonen wrote: Personally, I think MathML is so hopelessly verbose for hand authoring that this really shouldn't be about enabling hand authoring MathML-in-HTML5 but about enabling MathML-in-HTML5 (perhaps generated by a future version of itex2mml or similar) to be served through content management systems that are not built around a SAX pipeline or an XML tree API or XSLT but are built as tag soup systems and simply cannot guarantee well-formedness. I mean systems like WordPress and MovableType. Please don't confuse what these systems won't do with what they can't do. I personally wrote a system like this that maintained full well-formedness at all times despite TagSoup input. In fact, that was one of the easiest parts of what it did. See http://cafe.elharo.com/web/mokka/ Today's tools and libraries make it easy for anyone using anything more advanced than a text editor and FTP to publish well-formed documents. Tools that don't do that by default should be fixed. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The utility function for semantics in HTML
Elliotte Harold wrote: I suspect there are actually two axes here, and they're not orthogonal, [...] I agree we don't want to go all the way to 1 on the first axis. [...] However, I would turn the second axis all the way to about 0.99 Just to note that, in your model of non-orthogonal axes, you can't adjust the values independently like this. Moreover, your model of the axes being number of semantic elements and fraction of semantic elements doesn't work terribly well -- consider Matthew's point about the existence of i and b increasing the utility of em and strong. -- The universe doesn't care what you believe. The wonderful thing about science is that it doesn't ask for your faith, it just asks for your eyes --- http://xkcd.com/c154.html
Re: [whatwg] The problems with namespaces in text/html
Henri Sivonen wrote: A conforming HTML5 byte stream is *never* a well-formed XML 1.0 byte stream. Really? Never? There are many HTML 4 documents that are well-formed XML documents? Are these not legal HTML 5 documents? I scanned the spec quickly, but I didn't find anything that was flat out forbidden by XML. Is there some variant of the XML declaration or a DOCTYPE or requirement for an unclosed start-tag that would automatically make all HTML 5 documents malformed XML? -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
[whatwg] Typo in 9.2.3
Otherwise if the next seven chacacters are a case-insensitive match for the word DOCTYPE, then consume those characters and switch to the DOCTYPE state. chacacters -- characters -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] img element comments
Lachlan Hunt wrote: Using attributes to describe actual metadata about an image that has real practical benefits, for both the author and user, is not presentational in my view. Yes, but that is not what the height and width attributes are. They say nothing about the image and everything about the size at which the image is drawn. There's even an edge case where specifying incorrect dimensions could still be considered semantic. Unfortunately, I can't find the site I'm thinking of, but I've seen a site somewhere that created art by using small images and stretching them for the pixelation effect. In this case, stretching the image is part of the artwork's artistic value and meaning, not just it's presentation, and it would lose it all if the image were shown at it's actual size. There are always edge cases. The distinction between semantics and presentation is a fuzzy one. Nonetheless, I think most of the time height and width as specified on today's img tags are clearly presentational. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
Elliotte Harold wrote: Lachlan Hunt wrote: No, not without namespaces, just without the xmlns and QNames syntax. e.g. when math is encountered in text/html, it appears in the DOM as math xmlns=http://www.w3.org/1998/Math/MathML; That's like saying you want to have biology but without all that yucky evolution silliness. If you don't have xmnls and xmlns:prefix then there are no namespaces, period. In XML, that is absolutely true. However, we are talking about text/html only. I don't care what appears in the DOM. My model is not the DOM. Most models are not the DOM. Does that really matter, it's the concept that matters, not the specific model used. The DOM is just a convenient model to use in discussion. All we have is the document's text. This is what must be defined. If there are no namespaces in the text, then there are no namespaces. Why is the specific syntax so important? If, in HTML (not XHTML), math is defined to be interpreted as the math element in the MathML namespace, what difference does the syntax make in the end? All HTML elements are already defined to be in the XHTML namespace without any xmlns in the syntax, so how is that any different? We definitely don't want people thinking they can use any arbitrary xmlns in HTML. That's what XHTML is for. I'm not sure why that bothers you. As long as things are well-formed, what's the harm? text/html *does not* enforce well-formedness and *never will*. That's the problem! Existing browsers seem to deal OK and in a fairly well-defined way with content from arbitrary namespaces. (They ignore it.) I've taken advantage of this for years in my own Web pages. Sure, in XML, that's true. But in HTML, there currently are no namespaces (unless you count IE's disastrous XML Data Islands and Custom Tags, which also don't enforce well-formedness). -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] Footnotes, endnotes, sidenotes
Matthew Paul Thomas wrote: Scholarly books sometimes use both footnotes and endnotes for different things -- footnotes for citations and endnotes for tangential discussions, or vice versa. I've never seen an HTML document try to make this distinction, though. Distinguishing footnotes and endnotes would require a multipage document: footnotes go at the bottom of this page, endnotes at the bottom of some other page. Since HTML5 is primarily about single pages, I suggest calling any such element footnote and not having a separate endnote element. This is a good example of picking fewer over more semantics as discussed in another thread. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] Custom elements and attributes
Øistein E. Andersen wrote: I perfectly agree. (Actually, i think that U+7F (delete) and the C1 control characters should be excluded [transformed into U+FFFD] as well, but this could perhaps be problematic due to spurious CP1252 characters.) Spurious Cp1252 is a real problem. In fact, incorrectly labeled encoding is a real problem, and a thorny one. Draconian error handling in XML solves this, but I'm not sure what HTML 5 should do here. It's worth thinking about though. It's also worth reviewing the work the W3C TAG and I18N working groups did on this issue since a lot of smart people did a lot of thinking about this quite recently: http://www.w3.org/2001/tag/doc/mime-respect-20060412 http://www.w3.org/TR/charmod/ I don't remember the exact outcome myself, except that it's a really ugly problem that truly requires some changes in what options webmasters give to web content creators. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
Elliotte Harold wrote: Henri Sivonen wrote: A conforming HTML5 byte stream is *never* a well-formed XML 1.0 byte stream. Really? Never? Yes, never! For one, a conforming HTML 5 (not XHTML 5) document requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML. There are many HTML 4 documents that are well-formed XML documents? Are these not legal HTML 5 documents? No. That's like asking is a valid HTML 3.2 document a conforming HTML 4.01 document!? No, it's not! -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
Lachlan Hunt wrote: Why is the specific syntax so important? If, in HTML (not XHTML), math is defined to be interpreted as the math element in the MathML namespace, what difference does the syntax make in the end? All HTML elements are already defined to be in the XHTML namespace without any xmlns in the syntax, so how is that any different? The specific syntax is important because there's a huge, useful toolchain for processing XML and there's essentially zilch for processing this strange HTML 5 thing. If there ever is any software to process it, I expect it will just be an adapter that feeds the HTML 5 into the XML tools. Why not ditch the HTML 5 layer completely and simply allow the XML tools direct access? Remember, even HTML 4 is too complex for a lot of authors. More and more publishers are using CMSs and Wikis and markdown and Dreamweaver and similar tools. Dinosaur techies like me still editing this junk by hand can handle namespace prefixes, empty-element tags, and even MathML. (Well, maybe not raw MathML but I try.) Who, exactly is this HTML serialization supposed to help? -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
Lachlan Hunt wrote: I'm not sure why that bothers you. As long as things are well-formed, what's the harm? text/html *does not* enforce well-formedness and *never will*. That's the problem! application/xhtml+xml doesn't enforce well-formedness either. That's the job of the parser. If a server sends text/html, and the resulting character stream parses as well-formed, then it is well-formed. If it doesn't, it isn't. What matters more is what the stream is, not what the server says it is. (Yes, I know there are a few details about character encoding detection that come into play here. That doesn't change the point, though.) -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
On Sun, 05 Nov 2006 13:12:44 +0100, Elliotte Harold [EMAIL PROTECTED] wrote: I don't care what appears in the DOM. My model is not the DOM. Most models are not the DOM. Sorry, but the model of the web is the DOM, whether you like it or not. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] Custom elements and attributes
Elliotte Harold wrote: Spurious Cp1252 is a real problem. I'm not sure what HTML 5 should do here. At the very least, ISO-8859-1 must be treated as Windows-1252. I'm not sure about the other ISO-8859 encodings. Numeric and hex character references from 128 to 159 must also be treated as Windows-1252 code points. -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] Custom elements and attributes
On Sun, 05 Nov 2006 13:50:04 +0100, Elliotte Harold [EMAIL PROTECTED] wrote: [...] Spurious Cp1252 is a real problem. In fact, incorrectly labeled encoding is a real problem, and a thorny one. Draconian error handling in XML solves this, but I'm not sure what HTML 5 should do here. It's worth thinking about though. It's also worth reviewing the work the W3C TAG and I18N working groups did on this issue since a lot of smart people did a lot of thinking about this quite recently: http://www.w3.org/2001/tag/doc/mime-respect-20060412 http://www.w3.org/TR/charmod/ I don't remember the exact outcome myself, except that it's a really ugly problem that truly requires some changes in what options webmasters give to web content creators. How does requiring changes solve the problem for content that's out there? This doesn't make much sense to me. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
[whatwg] Spelling error: labelled -- labeled
Multiple times in the draft labelled should be change to labeled (unless maybe this is a British spelling?) I always get this one wrong myself, and wouldn't have noticed if the spell checker in Thunderbird hadn't complained. http://dictionary.reference.com/browse/labeled -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
Lachlan Hunt wrote: Yes, never! For one, a conforming HTML 5 (not XHTML 5) document requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML. OK. I thought it might be something like that. I just couldn't find it in skimming the spec. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
Anne van Kesteren wrote: Sorry, but the model of the web is the DOM, whether you like it or not. *The* model of the Web is not DOM. *The* model of the Web does not exist. There are multiple instances of *a* model of the Web. Several of these call themselves DOM, though I'm sure this group knows quite well just how different they really are. The map is not the world. The model is not the document. Any model is just a convenient local representation. The reality of the document is its text. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] Custom elements and attributes
Anne van Kesteren wrote: How does requiring changes solve the problem for content that's out there? This doesn't make much sense to me. The specific problem is that an author may publish a correctly labeled UTF-8 or ISO-8859-8 document or some such. However the server sends a Content-type header that requires the parser to treat the document as ISO-8859-1 or US-ASCII or something else. The need is for server administrators to allow content authors to specify content types and character sets for the documents they write. The content doesn't need to change. The authors just need the ability to specify the server headers for their documents. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] Spelling error: labelled -- labeled
Elliotte Harold wrote: Multiple times in the draft labelled should be change to labeled (unless maybe this is a British spelling?) labeled is the en-US spelling, labelled is the correct spelling in en-GB, en-AU and many other countries. wouldn't have noticed if the spell checker in Thunderbird hadn't complained. If you change to the en-AU or en-GB dictionary, it will complain about the other. -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] getElementsByClassName() idea
On Sun, 05 Nov 2006 16:18:32 +0600, Anne van Kesteren [EMAIL PROTECTED] wrote: I think this hasn't been suggested before. Perhaps the method should accept a DOMTokenString as argument instead of an array. This allows things like ele.getElementsByClassName(ele.className) etc. The only problem is how to get a DOMTokenString without first getting .className from somewhere. Perhaps it should be a constructor as well. 'x = new DOMTokenString(aaa bbb)' How is it better than DOMString? It inherits from DOMString. http://www.whatwg.org/specs/web-apps/current-work/#domtokenstring defines it. I still don't get it what's the advantage of having getElementsByClassName take a DOMTokenString argument over a plain DOMString. -- Alexey Feldgendler [EMAIL PROTECTED] [ICQ: 115226275] http://feldgendler.livejournal.com
Re: [whatwg] getElementsByClassName() idea
On Sun, 05 Nov 2006 14:27:14 +0100, Alexey Feldgendler [EMAIL PROTECTED] wrote: I still don't get it what's the advantage of having getElementsByClassName take a DOMTokenString argument over a plain DOMString. Oh right, sorry. Yeah, I suppose a DOMString makes more sense. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] The problems with namespaces in text/html
On Nov 5, 2006, at 15:12, Elliotte Harold wrote: Anne van Kesteren wrote: Sorry, but the model of the web is the DOM, whether you like it or not. *The* model of the Web is not DOM. The model in the browsers that matter is the DOM. Unfortunately. But it is too late to change it. And having even that level of interop is great. The map is not the world. The model is not the document. Any model is just a convenient local representation. The reality of the document is its text. Actually, the reality of an (X)HTML5 document is in abstract syntax tree parsed out of either serialization. In browsers that allow scripting, the tree has to be exposed via the DOM API and has to implement the DOM notion of the tree data model. But if you are writing a non-browser app that doesn't do scripting, you could use an HTML5 parser that emits SAX2 events and you could construct a XOM tree out of those. You don't need to care whether the SAX2 events came from an HTML5 parser or from an XML parser. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] The problems with namespaces in text/html
* Lachlan Hunt wrote: Yes, never! For one, a conforming HTML 5 (not XHTML 5) document requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML. Yes it is. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [whatwg] The problems with namespaces in text/html
Henri Sivonen wrote: The model in the browsers that matter is the DOM. Unfortunately. But it is too late to change it. And having even that level of interop is great. But as I keep saying, *it's not just browsers*. There's a lot more happening on the Web than classic desktop browsers. A document posted to the Web is available to all sorts of clients. Some of the most interesting things happen when there's no human in the loop to look at a browser. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
Bjoern Hoehrmann wrote: * Lachlan Hunt wrote: Yes, never! For one, a conforming HTML 5 (not XHTML 5) document requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML. Yes it is. Good catch. I forgot that. There are one or two XML parsers that blow this one, but they're not much used. The specific BNF production is: doctypedecl ::= '!DOCTYPE' S Name (S ExternalID)? S? ('[' intSubset ']' S?)? '' External ID and system ID are both optional. Is there anything else that stops every HTML5 document from being a well-formed XML document? -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
Elliotte Harold wrote: Lachlan Hunt wrote: Why is the specific syntax so important? The specific syntax is important because there's a huge, useful toolchain for processing XML and there's essentially zilch for processing this strange HTML 5 thing. In the real world, you cannot expect to be be able to process content served as text/html using XML tools. That's like trying to compile Java with a C++ compiler. Despite the similarities in syntax, they are different languages that require different parsers. Why not ditch the HTML 5 layer completely and simply allow the XML tools direct access? Because we have to remain compatible with the web, where there are an infinite number of existing documents that browsers must be able to handle interoperably. Who, exactly is this HTML serialization supposed to help? Anyone for whom interoperability in processing real world content is important. This includes, among others: * Browser vendors that have to deal with real world content. * CMS, editor, and other tool vendors that have to accept HTML input from users. * Authors that have to develop for the real world. * Users who like to surf the web in any browser they choose. -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] The problems with namespaces in text/html
* Elliotte Harold wrote: Really? Never? There are many HTML 4 documents that are well-formed XML documents? Are these not legal HTML 5 documents? I scanned the spec quickly, but I didn't find anything that was flat out forbidden by XML. For a document to be a HTML 4 document it would need a HTML 4.01 document type declaration, and for a document to be a well-formed XML document, it would have to have no HTML 4.01 document type declaration. There are not many documents that both have and don't have a HTML 4.01 document type declaration. Remember that the DTD needs to match the production for external subsets. -- Björn Höhrmann · mailto:[EMAIL PROTECTED] · http://bjoern.hoehrmann.de Weinh. Str. 22 · Telefon: +49(0)621/4309674 · http://www.bjoernsworld.de 68309 Mannheim · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
Re: [whatwg] The problems with namespaces in text/html
On Nov 5, 2006, at 15:54, Elliotte Harold wrote: Henri Sivonen wrote: The model in the browsers that matter is the DOM. Unfortunately. But it is too late to change it. And having even that level of interop is great. But as I keep saying, *it's not just browsers*. There's a lot more happening on the Web than classic desktop browsers. A document posted to the Web is available to all sorts of clients. Some of the most interesting things happen when there's no human in the loop to look at a browser. If your app does not run scripts from the Web, you are exempt from using the DOM as the model. Quoting the spec: User agents with no scripting support Implementations that do not support scripting (or which have their scripting features disabled) are exempt from supporting the events and DOM interfaces mentioned in this specification. For the parts of this specification that are defined in terms of an events model or in terms of the DOM, such user agents must still act as if events and the DOM were supported. I predict that in practice many non-browser apps will be in violation of the last sentence, because their authors will see more value in being able to use a streaming API without buffering or in being able to decouple the parser from the tree builder than in having interoperable error recovery. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] The problems with namespaces in text/html
Bjoern Hoehrmann wrote: * Lachlan Hunt wrote: Yes, never! For one, a conforming HTML 5 (not XHTML 5) document requires the DOCTYPE to be !DOCTYPE html and that is not well-formed XML. Yes it is. Oh, sorry. You're right. I thought it required a PUBLIC or SYSTEM identifier. I got confused because, unlike SGML, you can't have a public identifier without a system identifer. -- Lachlan Hunt http://lachy.id.au/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
Lachlan Hunt wrote: Why not ditch the HTML 5 layer completely and simply allow the XML tools direct access? Because we have to remain compatible with the web, where there are an infinite number of existing documents that browsers must be able to handle interoperably. You're getting this backwards. There's no reason for HTML 5 to be compatible with existing *documents*, existing browsers and tools sure; but other documents can be handled on their own. Who, exactly is this HTML serialization supposed to help? Anyone for whom interoperability in processing real world content is important. This includes, among others: * Browser vendors that have to deal with real world content. Browser vendors can handle XHTML now. It's a non-issue for them. * CMS, editor, and other tool vendors that have to accept HTML input from users. They mostly don't use HTML now. Instead they use things like markdown. If they do accept HTML, they tidy it up in various ways for security reasons, if not for well-formedness. Adding well-formedness checking to those that don't is quite simple. It's a question of will and desire, not ability. * Authors that have to develop for the real world. I have no idea what you mean by this. I suspect it's redundant. * Users who like to surf the web in any browser they choose. As long as any browser they choose is some browser released in this millennium, XHTML is fine. I'm sorry. The use cases so far just don't hold water. I reput the question: who does HTML serialization help? What problems does this solve? -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
Anne van Kesteren wrote: Well, the problem is that they would mean different things. Consider the following fragment: Meaning is in the eye of the beholder. In point of the fact, there are a lot more than two different things the fragment you propose might mean. Meaning is determined locally by each recipient for its own unique purposes, which may or may not be anything close to what the document producer expects. The idea that the server can somehow impose its interpretation of the content on the recipient is an illusion. It's never been true, and it's never going to be true, no matter how any specs you write. The syntax matters. If you give me the right well-formed syntax, I can do what I need to do with it, as can others. If you give me malformed syntax, working with the document gets a lot more complicated. My concern is not from browser vendors on agreeing on one interpretation that's somehow useful to them. It's making sure that they don't in the process break everything else anyone else might want to do with these documents. Where's Walter Perry when you need him? :-) -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
On Sun, 05 Nov 2006 15:29:29 +0100, Elliotte Harold [EMAIL PROTECTED] wrote: * Browser vendors that have to deal with real world content. Browser vendors can handle XHTML now. It's a non-issue for them. Working for one I can assure you it's very much an issue. -- Anne van Kesteren http://annevankesteren.nl/ http://www.opera.com/
Re: [whatwg] The problems with namespaces in text/html
On Nov 5, 2006, at 14:30, Elliotte Harold wrote: Henri Sivonen wrote: Personally, I think MathML is so hopelessly verbose for hand authoring that this really shouldn't be about enabling hand authoring MathML-in-HTML5 but about enabling MathML-in-HTML5 (perhaps generated by a future version of itex2mml or similar) to be served through content management systems that are not built around a SAX pipeline or an XML tree API or XSLT but are built as tag soup systems and simply cannot guarantee well-formedness. I mean systems like WordPress and MovableType. Please don't confuse what these systems won't do with what they can't do. I personally wrote a system like this that maintained full well-formedness at all times despite TagSoup input. In fact, that was one of the easiest parts of what it did. See http://cafe.elharo.com/web/mokka/ http://hsivonen.iki.fi/validator/?doc=http%3A%2F%2Fcafe.elharo.com% 2Fweb%2Fmokka%2Fparser=xmllaxtype=yes I think that makes my point for me. Today's tools and libraries make it easy for anyone using anything more advanced than a text editor and FTP to publish well-formed documents. Tools that don't do that by default should be fixed. In order for a non-trivial tool to produce well-formed output consistently and reliably, the architecture of the tool needs to be designed with this goal in mind. It is remarkably difficult to make e.g. WordPress or MovableType emit stuff that is for sure suitable for serving as application/xhtml+xml. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] The problems with namespaces in text/html
On Nov 5, 2006, at 16:39, Elliotte Harold wrote: Henri Sivonen wrote: Is there anything else that stops every HTML5 document from being a well-formed XML document? Case-insensitivity and empty elements for example. These would stop some documents from being well-formed, not all. I'm sure you're allowed to use all lower case. can you use empty- element tags if you wish? Or must it be br and not br / or br/br? It must be br to be conforming. You are still stuck with the syntactic similarity of XML and HTML5. You wouldn't use an XML parser to parse RELAX NG Compact Syntax, would you? Also, even if a subset of HTML5 documents happened to be parseable as XML, it doesn't help unless the authors whose documents you consume happen to only produce that subset. If your app insists on using an XML parser for text/html content, it isn't very useful for processing the stuff found on the Web. But in any case, even if an HTML5 byte stream happens to be parseable as XML 1.0, you get the wrong infoset if you use an XML parser instead of an HTML5 parser. Walter, we need you! There is no right infoset. There is no wrong infoset. Given this HTML document: !DOCTYPE htmlHTMLtitleFoo/Titlepbar/html a parser should convey to the application a tree that has the following features: * There is a root element node with the local name html in the http://www.w3.org/1999/xhtml; namespace. * The root element node has two child nodes. * The root element node has an element node with the local name head in the http://www.w3.org/1999/xhtml; namespace as its first child. * The root element node has an element node with the local name body in the http://www.w3.org/1999/xhtml; namespace as its last child. * The first child of the root element has a single child node, which is an element node with the local name title in the http:// www.w3.org/1999/xhtml namespace. * The first child of the root element has a single child node, which is an element node with the local name p in the http:// www.w3.org/1999/xhtml namespace. * The element with the local name title in the http://www.w3.org/ 1999/xhtml namespace has a single child node, which is a text node with the value Foo. * The element with the local name p in the http://www.w3.org/ 1999/xhtml namespace has a single child node, which is a text node with the value bar. If your parser reports something else, it is not suitable for parsing HTML5 and is *wrong* per spec. the infoset I derive from the document is my concern, not yours. You want certain stuff to be in a particular namespace. From this thread, it seems that you want to make it my problem to produce particular namespace declaration syntax--instead of making it your concern to use an HTML5 parser. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] The problems with namespaces in text/html
On Nov 5, 2006, at 16:35, Elliotte Harold wrote: Anne van Kesteren wrote: Well, the problem is that they would mean different things. Consider the following fragment: Meaning is in the eye of the beholder. In point of the fact, there are a lot more than two different things the fragment you propose might mean. Meaning is determined locally by each recipient for its own unique purposes, which may or may not be anything close to what the document producer expects. Well, the whole point of having a spec is to give a mutual understanding to the producer and consumer of what the bytes mean. The syntax matters. If you give me the right well-formed syntax, I can do what I need to do with it, as can others. If you give me malformed syntax, working with the document gets a lot more complicated. My concern is not from browser vendors on agreeing on one interpretation that's somehow useful to them. It's making sure that they don't in the process break everything else anyone else might want to do with these documents. Everyone who wishes to process HTML5 with XML tools is going to need an HTML5 parser that exposes an interface that makes the HTML5 parser look like an XML parser to the rest of the tool chain. The part of the XML tool chain that you can't use with HTML5 is the XML parser itself. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
On Nov 5, 2006, at 16:29, Elliotte Harold wrote: Lachlan Hunt wrote: Why not ditch the HTML 5 layer completely and simply allow the XML tools direct access? Because we have to remain compatible with the web, where there are an infinite number of existing documents that browsers must be able to handle interoperably. You're getting this backwards. There's no reason for HTML 5 to be compatible with existing *documents*, existing browsers and tools sure; but other documents can be handled on their own. The HTML5 parsing algorithm is not about adding a third parser alongside old HTML and XML. It is about defining a parsing algorithm for text/html content in general--including content that purports to conform to older HTML specs. Browser vendors can handle XHTML now. It's a non-issue for them. Having worked recently on improving XHTML handling in Gecko, I assure you that XHTML is not a non-issue from the browser point of view. * CMS, editor, and other tool vendors that have to accept HTML input from users. They mostly don't use HTML now. Instead they use things like markdown. If they do accept HTML, they tidy it up in various ways for security reasons, if not for well-formedness. Adding well- formedness checking to those that don't is quite simple. It's a question of will and desire, not ability. You are underestimating the bozo factor on ability and the effect of legacy code even when there is desire. -- Henri Sivonen [EMAIL PROTECTED] http://hsivonen.iki.fi/
Re: [whatwg] The problems with namespaces in text/html (Was: MathML-in-HTML5)
Henri Sivonen wrote: If my parser tells your ContentHandler that there is a namespace, then there is. Your ContentHandler doesn't get to see any source bytes. But you don't send me a parser and you don't talk to my ContentHandler. Your HTTP server sends me a stream of bytes that my parser then processes. All I get from you is bytes. This is what we exchange; not infosets, not DOM trees, not trees of any sort. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] The problems with namespaces in text/html
Henri Sivonen wrote: http://cafe.elharo.com/web/mokka/ http://hsivonen.iki.fi/validator/?doc=http%3A%2F%2Fcafe.elharo.com%2Fweb%2Fmokka%2Fparser=xmllaxtype=yes I think that makes my point for me. Not really. That's not the system I was talking about. The article at the URL I referenced describes the old system as well the reasons I switched to WordPress, but that had nothing to do with well-formedness. My point is that well-formedness is not hard to maintain if the server vendor cares to do so. The problem is that many server vendors don't understand why they should make even a small effort in this direction. Catering to their problems does not strike me as a wise course for the Web. The URL timed out when I tried to use your validator so I'm not sure what I saw. xmllint did pick up one problem with an unescaped ampersand in some third party code that draconian error handling would have noticed weeks ago. I'll have to bug that provider to fix that. -- Elliotte Rusty Harold [EMAIL PROTECTED] Java I/O 2nd Edition Just Published! http://www.cafeaulait.org/books/javaio2/ http://www.amazon.com/exec/obidos/ISBN=0596527500/ref=nosim/cafeaulaitA/
Re: [whatwg] colspan=0
On Nov 4, 2006, at 1:53 AM, Henri Sivonen wrote: None of Opera 9.02, Firefox 2.0, IE7 and Safari 2.0.4 implement colspan=0 as specified in HTML 4.01. Trident, Presto and WebKit at least agree on what to do with it: they treat it like colspan=1. I suggest that only positive integers be conforming and that non-conforming values be treated as 1. ... I know browser vendors have had a long time to implement this, but still, I think giving up on it would be a shame. The number of rows or columns in a table is often rather expensive to calculate ahead of time. As long as this has to be done to calculate the rowspan= or colspan= of header cells, this can substantially increase the time an application takes to generate a table. For the browser to interpret colspan=0 or rowspan=0 instead would both make life easier for application authors, and make such pages faster overall. -- Matthew Paul Thomas http://mpt.net.nz/