Re: [whatwg] several messages about the HTML syntax
On 03/02/2008 03:02 PM, Ian Hickson wrote: On Tue, 31 Jul 2007, Philip Taylor wrote: IE undocumentedly recognises some which nobody else does: aafsU+206D ACTIVATE ARABIC FORM SHAPING ass U+206B ACTIVATE SYMMETRIC SWAPPING iafsU+206C INHIBIT ARABIC FORM SHAPING iss U+206A INHIBIT SYMMETRIC SWAPPING lre U+202A LEFT-TO-RIGHT EMBEDDING lro U+202D LEFT-TO-RIGHT OVERRIDE nadsU+206E NATIONAL DIGIT SHAPES nodsU+206F NOMINAL DIGIT SHAPES pdf U+202C POP DIRECTIONAL FORMATTING rle U+202B RIGHT-TO-LEFT EMBEDDING rlo U+202E RIGHT-TO-LEFT OVERRIDE zwspU+200B ZERO WIDTH SPACE (I believe that list is complete.) The first eleven were suggested on https://listserv.heanet.ie/cgi-bin/wa?A2=ind9605L=html-wgP=4579 some time ago but don't seem to have gone very far (except into IE). I can see some legitimate users at http://www.tasb.com/services/field/staff/index.aspx?print=true and http://www.pelesoft.co.il/ and maybe there's a few dozen or hundred more elsewhere (but I can't measure it easily). There's some in text-art at http://yy28.60.kg/test/read.cgi/maido3/1096370177/l50 and quite a lot in weird places like http://cheese.2ch.net/life/kako/1010/10103/1010391447.html or http://zerosen52.gozaru.jp/log/1093422333.html that I don't understand but that seem to all be on 2channel (or copied from it). I've no idea how common they are in general. Are these used significantly on the web, or would they be considered highly useful if anyone knew they existed, or should HTML5 just ignore them? I'm very skeptical about introducing entities for the codes that are redundant with dir= and bdo (namely, lre, lro, pdf, rle, rlo). I agree 100% with this rationale. I don't know enough about the others to have an educated opinion. I can set up a search to examine the data in more detail. I don't know much about the others, but I can provide some info on ZWSP. It is (as specced) equivalent to wbr. Specifically, it * defines a word break (line-breaking opportunity) * thereby breaks Arabic joining For contrast: ZWSP - Breaks a word (and therefore also Arabic joining) with no visible space. ZWJ - Not a word break. Forces joining behavior. ZWNJ - Not a word break, but breaks joining. ZWJ and ZWNJ are primarily useful for Arabic and other shaped scripts. I'm not sure of the common uses for ZWJ, but ZWNJ is frequently used in Persian to visually separate grammatical prefixes from the rest of the word (without breaking the word or introducing extra space). ZWSP is more likely to be used in Thai and related scripts, to define word boundaries. Thai does not use spaces between words, so break opportunities need to either be marked with ZWSP or found with a dictionary. Even in the presence of automatic dictionary-breaking, however, there are ambiguous cases which will need ZWSP to show the correct break-point. There's further discussion about this in https://www.w3.org/Bugs/Public/show_bug.cgi?id=13108 I've no comment on concerns about compatibility with XML, but I can say that I've typed zwsp; multiple times expecting it to work and find it surprising that zwj; and zwnj; work, but zwsp; does not... ~fantasai
Re: [whatwg] several messages about the HTML syntax
Dnia 03-03-2008, Pn o godzinie 20:18 +, David Gerard pisze: On 03/03/2008, Ian Hickson [EMAIL PROTECTED] wrote: On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote: When I want to define a paragraph-style tool-tip, I am left with the following choice: either make the source code unreadable by making an excessively long line (this is also true for URI attributes but they are not expected to be readable) or make the tool-tip ugly by inserting line breaks. (It cannot be done in an portable way because the width of the tool-tip window and the fount metrics at the viewer's UI are unknown). I recommend not making paragraph-long tooltips. That's terrible user interface. But how will we read the asides on xkcd.com ?! Admittedly, we cannot, at least not in Firefox. Hm. Chris
Re: [whatwg] several messages about the HTML syntax
Dnia 02-03-2008, N o godzinie 23:02 +, Ian Hickson pisze: On Tue, 31 Jul 2007, Simon Pieters wrote: Aha. I didn't think of testing attributes. Safari preserves CRs in attribute values, both real and NCRs. CRLF pairs, LFCR pairs, CRs and LFs cause a single linebreak in the tooltip. In data, CRs don't cause linebreaks. For title=, IE preserves CRs in attribute values, both real and NCRs. CRLF pairs, CRs and LFs in the DOM gets rendered as a signle linebreak in the tooltip. For value=, all types of linebreaks are converted to CRLF pairs. In data, only CRs cause linebreaks and LFs are rendered as spaces. Firefox preserves CRs in attribute values, both real and NCRs. CRs are ignored and LFs are rendered as spaces in the tooltip. In data, CRs don't cause linebreaks. For title=, Opera drops LFs in attribute values, both real and NCRs, and converts CRs (both real and NCRs) to spaces. For value=, CRs and LFs are preserved as written, both real and NCRs. Personally, I think attribute values should be parsed the same way as data is parsed wrt linebreaks. I agree. I oppose. When I want to define a paragraph-style tool-tip, I am left with the following choice: either make the source code unreadable by making an excessively long line (this is also true for URI attributes but they are not expected to be readable) or make the tool-tip ugly by inserting line breaks. (It cannot be done in an portable way because the width of the tool-tip window and the fount metrics at the viewer's UI are unknown). I recommend: convert line feeds to spaces, use #8232; and #8233; where appropriate, use #10;, #13; where necessary (these should not be converted to spaces). Chris
Re: [whatwg] several messages about the HTML syntax
On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote: When I want to define a paragraph-style tool-tip, I am left with the following choice: either make the source code unreadable by making an excessively long line (this is also true for URI attributes but they are not expected to be readable) or make the tool-tip ugly by inserting line breaks. (It cannot be done in an portable way because the width of the tool-tip window and the fount metrics at the viewer's UI are unknown). I recommend not making paragraph-long tooltips. That's terrible user interface. -- Ian Hickson U+1047E)\._.,--,'``.fL http://ln.hixie.ch/ U+263A/, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Re: [whatwg] several messages about the HTML syntax
On 03/03/2008, Ian Hickson [EMAIL PROTECTED] wrote: On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote: When I want to define a paragraph-style tool-tip, I am left with the following choice: either make the source code unreadable by making an excessively long line (this is also true for URI attributes but they are not expected to be readable) or make the tool-tip ugly by inserting line breaks. (It cannot be done in an portable way because the width of the tool-tip window and the fount metrics at the viewer's UI are unknown). I recommend not making paragraph-long tooltips. That's terrible user interface. But how will we read the asides on xkcd.com ?! (i.e.: If people can do something, they will, and this needs to be allowed for. ASCII art in tooltips hits my wrong button, but it's out there. OTOH I've never seen a tooltip in a monospaced font. User-agents treating all whitespace as spaces and reformatting as nicely as they can would be fine to me. I'm sure others will come up with real-life use cases for ridiculously long tooltips.) - d.
Re: [whatwg] several messages about the HTML syntax
David Gerard wrote: On 03/03/2008, Ian Hickson [EMAIL PROTECTED] wrote: On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote: When I want to define a paragraph-style tool-tip, I am left with the following choice: either make the source code unreadable by making an excessively long line (this is also true for URI attributes but they are not expected to be readable) or make the tool-tip ugly by inserting line breaks. (It cannot be done in an portable way because the width of the tool-tip window and the fount metrics at the viewer's UI are unknown). I recommend not making paragraph-long tooltips. That's terrible user interface. But how will we read the asides on xkcd.com ?! (i.e.: If people can do something, they will, and this needs to be allowed for. ASCII art in tooltips hits my wrong button, but it's out there. OTOH I've never seen a tooltip in a monospaced font. User-agents treating all whitespace as spaces and reformatting as nicely as they can would be fine to me. I'm sure others will come up with real-life use cases for ridiculously long tooltips.) The current spec doesn't forbid paragraph-style tooltips. It just doesn't pander to them. This seems like a very good tradeoff. / Jonas
Re: [whatwg] several messages about the HTML syntax
On Sun, 2 Mar 2008 23:02:07 + (UTC), Ian Hickson wrote: On Sat, 15 Sep 2007, Henri Sivonen wrote: Currently, unquoted attributes may start with a = [as in img alt==Foobar src='404'] This means that the notion of conformance fails to catch what is most likely an error: [...] To make the notion of conformance more useful for authors (that is, to make conformance checking catch unintentional stuff), I suggest making starting an unquoted attribute value with a = a parse error. Done. On Mon, 17 Sep 2007, Øistein E. Andersen wrote: An alternative solution would be to require that unquoted attribute values not contain (single or double ASCII) quotes. Done. I really meant to say that disallowing quotation marks in unquoted attribute values would make conformance checkers able to detect the particular error pointed out by Mr Sivonen even without disallowing equals signs. (The editor may well have noticed this, but his answer does not reflect this.) Other potential authoring mistakes pointed out since support the case for disallowing quotation marks. I am still not convinced about the usefulness of disallowing equals signs, but I have not considered the issue of which characters should be allowed or not in any detail. -- Øistein E. Andersen
Re: [whatwg] several messages about the HTML syntax
Executive summary: * Changed the rang; and lang; entities (which we'd already changed anyway) to something more appropriate. (r1286) * Made a number of things parse errors to allow conformance checkers to catch common attribute mistakes. (r1292, r1293, r1299, r1303) * Made a number of changes to parsing for compatibility reasons: entities no longer get parsed betwen comments in RCDATA elements, three more ways to trigger quirks mode, made DOCTYPE parsing not trigger quirks mode if there's trailing garbage (r1294, r1302, r1306) * Made entities at the end of an attribute be not a parse error. (r1296) * A number of editorial changes. (in range r1286 - r1307) On Fri, 29 Jun 2007, Henri Sivonen wrote: On Jun 29, 2007, at 11:59, Simon Pieters wrote: U+003E GREATER-THAN SIGN () Parse error. Set the DOCTYPE token's correctness flag to incorrect. Emit that DOCTYPE token. Switch to the data state. Should the string (public id or system id) that was being built be dropped on the floor as well? On Fri, 29 Jun 2007, Simon Pieters wrote: I don't see a good reason to drop it. The doctype's correctness flag is set to incorrect anyway. But I don't feel strongly about it either way. Agreed. On Fri, 29 Jun 2007, Simon Pieters wrote: IE seems to not emit the token for that is in quotes anywhere for both doctypes and bogus comments (or it treats doctypes as bogus comments): !doctype ! ? / This does not apply to these: !-- -- -- % % % Yeah, I don't think we want to capture IE's complex rules here. On Sun, 1 Jul 2007, �istein E. Andersen wrote: HTML5 currently maps lang; and rang; to U+3008 LEFT ANGLE BRACKET, U+3009 RIGHT ANGLE BRACKET, both belonging to `CJK angle brackets' in U+3000--U+303F CJK Symbols and Puntuation. HTML 4.01 maps them to U+2329 LEFT-POINTING ANGLE BRACKET, U+232A RIGHT-POINTING ANGLE BRACKET from `Angle brackets' in the range U+2300--U+23FF Miscellaneous Technical. Unicode 5.0 notes: These are discouraged for mathematical use because of their canonical equivalence to CJK punctuation. It would probably be better to use U+27E8 MATHEMATICAL LEFT ANGLE BRACKET, U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET from `Mathematical brackets' in U+27C0--U+27EF Miscellaneous Mathematical Symbols-A, characters that did not yet exist when HTML 4.01 was published. I've made this change. This approach is suggested by http://unicode.org/Public/math/revision-09/MathMap-9.txt: 27E8; lang; ISOTECH;** # #10216; MATHEMATICAL LEFT ANGLE BRACKET 27E9; rang; ISOTECH;** # #10217; MATHEMATICAL RIGHT ANGLE BRACKET Moreover, the (few) browsers I have tested render lang/rang, #x2329/#x232a and #x27e8/#x27e9 identically or simalarly (as / in approximative ASCII), whereas #x3008/#x3009 are rendered as full-width East-Asian characters ( / ). The browsers I tested were not at all consistent. On Sun, 1 Jul 2007, L. David Baron wrote: What's wrong with these mappings, and why shouldn't they also be the mappings in HTML5? On Sun, 1 Jul 2007, �istein E. Andersen wrote: The problem is that they are canonically equivalent to CJK characters. On Sun, 1 Jul 2007, L. David Baron wrote: Makes sense. I think I misread your original message. (Although changing them at all seems a little scary.) Well, we'd changed them anyway (since before they mapped to non-canonical characters); changing them to something better seems at least partially sensible... Browsers are pretty poor on these two entities anyway. On Fri, 6 Jul 2007, Simon Pieters wrote: On Fri, 22 Jun 2007 04:19:53 +0200, Ian Hickson [EMAIL PROTECTED] wrote: a == Safari, Opera and Firefox drop the attribute. IE has an attribute with the name being the empty string and the value being =. The HTML5 parsing spec says that there should be an attribute with the name = and the value the empty string. The Before attribute name state part of the parsing spec might have to be revisited. I don't see any harm in leaving the spec as-is here, given the lack of interoperability and the fact that there's no real reason to be using attributes with this name anyway. Whatever's simplest to implement is probably best here. Since it doesn't match any browser, and probably is an authoring mistake (that would silently pass conformance checking in the case of embed), could it be a parse error? (Also update the wording in the syntax section if so.) Done. On Mon, 16 Jul 2007, Henri Sivonen wrote: In the Data State the spec says: U+0026 AMPERSAND () When the content model flag is set to one of the PCDATA or RCDATA states: switch to the entity data state. Otherwise: treat it as per the anything else entry below. html5lib tests, WebKit trunk, Firefox 2.0.0.4 and