Re: [whatwg] several messages about the HTML syntax

2014-07-22 Thread fantasai

On 03/02/2008 03:02 PM, Ian Hickson wrote:


On Tue, 31 Jul 2007, Philip Taylor wrote:


IE undocumentedly recognises some which nobody else does:

aafsU+206D  ACTIVATE ARABIC FORM SHAPING
ass U+206B  ACTIVATE SYMMETRIC SWAPPING
iafsU+206C  INHIBIT ARABIC FORM SHAPING
iss U+206A  INHIBIT SYMMETRIC SWAPPING
lre U+202A  LEFT-TO-RIGHT EMBEDDING
lro U+202D  LEFT-TO-RIGHT OVERRIDE
nadsU+206E  NATIONAL DIGIT SHAPES
nodsU+206F  NOMINAL DIGIT SHAPES
pdf U+202C  POP DIRECTIONAL FORMATTING
rle U+202B  RIGHT-TO-LEFT EMBEDDING
rlo U+202E  RIGHT-TO-LEFT OVERRIDE
zwspU+200B  ZERO WIDTH SPACE

(I believe that list is complete.)

The first eleven were suggested on
https://listserv.heanet.ie/cgi-bin/wa?A2=ind9605L=html-wgP=4579 some
time ago but don't seem to have gone very far (except into IE).

I can see some legitimate users at
http://www.tasb.com/services/field/staff/index.aspx?print=true and
http://www.pelesoft.co.il/ and maybe there's a few dozen or hundred
more elsewhere (but I can't measure it easily). There's some in text-art
at http://yy28.60.kg/test/read.cgi/maido3/1096370177/l50 and quite a
lot in weird places like
http://cheese.2ch.net/life/kako/1010/10103/1010391447.html or
http://zerosen52.gozaru.jp/log/1093422333.html that I don't understand
but that seem to all be on 2channel (or copied from it). I've no idea
how common they are in general.

Are these used significantly on the web, or would they be considered
highly useful if anyone knew they existed, or should HTML5 just ignore
them?


I'm very skeptical about introducing entities for the codes that are
redundant with dir= and bdo (namely, lre, lro, pdf, rle, rlo).


I agree 100% with this rationale.


I don't know enough about the others to have an educated opinion. I can
set up a search to examine the data in more detail.


I don't know much about the others, but I can provide some info on ZWSP.
It is (as specced) equivalent to wbr. Specifically, it
  * defines a word break (line-breaking opportunity)
  * thereby breaks Arabic joining

For contrast:
  ZWSP - Breaks a word (and therefore also Arabic joining) with no visible 
space.
  ZWJ  - Not a word break. Forces joining behavior.
  ZWNJ - Not a word break, but breaks joining.

ZWJ and ZWNJ are primarily useful for Arabic and other shaped scripts. I'm
not sure of the common uses for ZWJ, but ZWNJ is frequently used in Persian
to visually separate grammatical prefixes from the rest of the word (without
breaking the word or introducing extra space).

ZWSP is more likely to be used in Thai and related scripts, to define word
boundaries. Thai does not use spaces between words, so break opportunities
need to either be marked with ZWSP or found with a dictionary. Even in the
presence of automatic dictionary-breaking, however, there are ambiguous
cases which will need ZWSP to show the correct break-point.

There's further discussion about this in
  https://www.w3.org/Bugs/Public/show_bug.cgi?id=13108
I've no comment on concerns about compatibility with XML, but I can say
that I've typed zwsp; multiple times expecting it to work and find it
surprising that zwj; and zwnj; work, but zwsp; does not...

~fantasai


Re: [whatwg] several messages about the HTML syntax

2008-03-04 Thread Krzysztof Żelechowski

Dnia 03-03-2008, Pn o godzinie 20:18 +, David Gerard pisze:
 On 03/03/2008, Ian Hickson [EMAIL PROTECTED] wrote:
  On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote:
 
When I want to define a paragraph-style tool-tip, I am left with the
following choice: either make the source code unreadable by making an
excessively long line (this is also true for URI attributes but they are
not expected to be readable) or make the tool-tip ugly by inserting line
breaks.  (It cannot be done in an portable way because the width of the
tool-tip window and the fount metrics at the viewer's UI are unknown).
 
  I recommend not making paragraph-long tooltips. That's terrible user
   interface.
 
 
 But how will we read the asides on xkcd.com ?!

Admittedly, we cannot, at least not in Firefox.

Hm.

Chris



Re: [whatwg] several messages about the HTML syntax

2008-03-03 Thread Krzysztof Żelechowski

Dnia 02-03-2008, N o godzinie 23:02 +, Ian Hickson pisze:

 On Tue, 31 Jul 2007, Simon Pieters wrote:
  
  Aha. I didn't think of testing attributes.
  
  Safari preserves CRs in attribute values, both real and NCRs. CRLF 
  pairs, LFCR pairs, CRs and LFs cause a single linebreak in the tooltip. 
  In data, CRs don't cause linebreaks.
  
  For title=, IE preserves CRs in attribute values, both real and NCRs. 
  CRLF pairs, CRs and LFs in the DOM gets rendered as a signle linebreak 
  in the tooltip. For value=, all types of linebreaks are converted to 
  CRLF pairs. In data, only CRs cause linebreaks and LFs are rendered as 
  spaces.
  
  Firefox preserves CRs in attribute values, both real and NCRs. CRs are 
  ignored and LFs are rendered as spaces in the tooltip. In data, CRs 
  don't cause linebreaks.
  
  For title=, Opera drops LFs in attribute values, both real and NCRs, 
  and converts CRs (both real and NCRs) to spaces. For value=, CRs and 
  LFs are preserved as written, both real and NCRs.
  
  Personally, I think attribute values should be parsed the same way as 
  data is parsed wrt linebreaks.
 
 I agree.

I oppose.  

When I want to define a paragraph-style tool-tip, 
I am left with the following choice: 
either make the source code unreadable 
by making an excessively long line 
(this is also true for URI attributes 
but they are not expected to be readable)
or make the tool-tip ugly by inserting line breaks.  
(It cannot be done in an portable way 
because the width of the tool-tip window 
and the fount metrics at the viewer's UI are unknown).

I recommend: 
convert line feeds to spaces,
use #8232; and #8233; where appropriate,
use #10;, #13; where necessary 
(these should not be converted to spaces).

Chris





Re: [whatwg] several messages about the HTML syntax

2008-03-03 Thread Ian Hickson
On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote:
 
 When I want to define a paragraph-style tool-tip, I am left with the 
 following choice: either make the source code unreadable by making an 
 excessively long line (this is also true for URI attributes but they are 
 not expected to be readable) or make the tool-tip ugly by inserting line 
 breaks.  (It cannot be done in an portable way because the width of the 
 tool-tip window and the fount metrics at the viewer's UI are unknown).

I recommend not making paragraph-long tooltips. That's terrible user 
interface.

-- 
Ian Hickson   U+1047E)\._.,--,'``.fL
http://ln.hixie.ch/   U+263A/,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Re: [whatwg] several messages about the HTML syntax

2008-03-03 Thread David Gerard
On 03/03/2008, Ian Hickson [EMAIL PROTECTED] wrote:
 On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote:

   When I want to define a paragraph-style tool-tip, I am left with the
   following choice: either make the source code unreadable by making an
   excessively long line (this is also true for URI attributes but they are
   not expected to be readable) or make the tool-tip ugly by inserting line
   breaks.  (It cannot be done in an portable way because the width of the
   tool-tip window and the fount metrics at the viewer's UI are unknown).

 I recommend not making paragraph-long tooltips. That's terrible user
  interface.


But how will we read the asides on xkcd.com ?!

(i.e.: If people can do something, they will, and this needs to be
allowed for. ASCII art in tooltips hits my wrong button, but it's
out there. OTOH I've never seen a tooltip in a monospaced font.
User-agents treating all whitespace as spaces and reformatting as
nicely as they can would be fine to me. I'm sure others will come up
with real-life use cases for ridiculously long tooltips.)


- d.


Re: [whatwg] several messages about the HTML syntax

2008-03-03 Thread Jonas Sicking

David Gerard wrote:

On 03/03/2008, Ian Hickson [EMAIL PROTECTED] wrote:

On Mon, 3 Mar 2008, Krzysztof Żelechowski wrote:



  When I want to define a paragraph-style tool-tip, I am left with the
  following choice: either make the source code unreadable by making an
  excessively long line (this is also true for URI attributes but they are
  not expected to be readable) or make the tool-tip ugly by inserting line
  breaks.  (It cannot be done in an portable way because the width of the
  tool-tip window and the fount metrics at the viewer's UI are unknown).



I recommend not making paragraph-long tooltips. That's terrible user
 interface.



But how will we read the asides on xkcd.com ?!

(i.e.: If people can do something, they will, and this needs to be
allowed for. ASCII art in tooltips hits my wrong button, but it's
out there. OTOH I've never seen a tooltip in a monospaced font.
User-agents treating all whitespace as spaces and reformatting as
nicely as they can would be fine to me. I'm sure others will come up
with real-life use cases for ridiculously long tooltips.)


The current spec doesn't forbid paragraph-style tooltips. It just 
doesn't pander to them. This seems like a very good tradeoff.


/ Jonas


Re: [whatwg] several messages about the HTML syntax

2008-03-03 Thread Øistein E . Andersen
On Sun, 2 Mar 2008 23:02:07 + (UTC), Ian Hickson wrote:
On Sat, 15 Sep 2007, Henri Sivonen wrote:
 
 Currently, unquoted attributes may start with a = 
 [as in img alt==Foobar src='404']
 
 This means that the notion of conformance fails to catch what is most 
 likely an error: [...]
 
 To make the notion of conformance more useful for authors (that is, to 
 make conformance checking catch unintentional stuff), I suggest making 
 starting an unquoted attribute value with a = a parse error.

 Done.


 On Mon, 17 Sep 2007, Øistein E. Andersen wrote:
 
 An alternative solution would be to require that unquoted attribute 
 values not contain (single or double ASCII) quotes.

Done.

I really meant to say that disallowing quotation marks in unquoted
attribute values would make conformance checkers able to detect
the particular error pointed out by Mr Sivonen even without disallowing
equals signs.  (The editor may well have noticed this, but his answer
does not reflect this.)

Other potential authoring mistakes pointed out since support the case
for disallowing quotation marks.  I am still not convinced about
the usefulness of disallowing equals signs, but I have not considered
the issue of which characters should be allowed or not in any detail.

-- 
Øistein E. Andersen




Re: [whatwg] several messages about the HTML syntax

2008-03-02 Thread Ian Hickson

Executive summary:

 * Changed the rang; and lang; entities (which we'd already changed 
   anyway) to something more appropriate. (r1286)

 * Made a number of things parse errors to allow conformance checkers to 
   catch common attribute mistakes. (r1292, r1293, r1299, r1303)

 * Made a number of changes to parsing for compatibility reasons: entities 
   no longer get parsed betwen comments in RCDATA elements, three more 
   ways to trigger quirks mode, made DOCTYPE parsing not trigger quirks 
   mode if there's trailing garbage (r1294, r1302, r1306)

 * Made entities at the end of an attribute be not a parse error. (r1296)

 * A number of editorial changes. (in range r1286 - r1307)


On Fri, 29 Jun 2007, Henri Sivonen wrote:
 On Jun 29, 2007, at 11:59, Simon Pieters wrote:
  
 U+003E GREATER-THAN SIGN ()
Parse error. Set the DOCTYPE token's correctness flag to incorrect.
Emit that DOCTYPE token. Switch to the data state.
 
 Should the string (public id or system id) that was being built be 
 dropped on the floor as well?

On Fri, 29 Jun 2007, Simon Pieters wrote:
 
 I don't see a good reason to drop it. The doctype's correctness flag is 
 set to incorrect anyway. But I don't feel strongly about it either way.

Agreed.


On Fri, 29 Jun 2007, Simon Pieters wrote:
 
 IE seems to not emit the token for  that is in quotes anywhere for both 
 doctypes and bogus comments (or it treats doctypes as bogus comments):
 
!doctype  
!  
?  
/  
 
 This does not apply to these:
 
!-- -- --
% % %

Yeah, I don't think we want to capture IE's complex rules here.


On Sun, 1 Jul 2007, �istein E. Andersen wrote:

 HTML5 currently maps lang; and rang; to
 U+3008 LEFT ANGLE BRACKET,
 U+3009 RIGHT ANGLE BRACKET,
 both belonging to `CJK angle brackets' in
 U+3000--U+303F CJK Symbols and Puntuation.
 
 HTML 4.01 maps them to
 U+2329 LEFT-POINTING ANGLE BRACKET,
 U+232A RIGHT-POINTING ANGLE BRACKET
 from `Angle brackets' in the range
 U+2300--U+23FF Miscellaneous Technical.
 
 Unicode 5.0 notes:
  These are discouraged for mathematical use because of their
  canonical equivalence to CJK punctuation.
 
 It would probably be better to use
 U+27E8 MATHEMATICAL LEFT ANGLE BRACKET,
 U+27E9 MATHEMATICAL RIGHT ANGLE BRACKET
 from `Mathematical brackets' in
 U+27C0--U+27EF Miscellaneous Mathematical Symbols-A,
 characters that did not yet exist when HTML 4.01 was published.

I've made this change.


 This approach is suggested by
 http://unicode.org/Public/math/revision-09/MathMap-9.txt:
  27E8;   lang;   ISOTECH;**  # #10216;  MATHEMATICAL 
  LEFT ANGLE BRACKET
  27E9;   rang;   ISOTECH;**  # #10217;  MATHEMATICAL 
  RIGHT ANGLE BRACKET
 
 Moreover, the (few) browsers I have tested render
 lang/rang, #x2329/#x232a and #x27e8/#x27e9 identically
 or simalarly (as / in approximative ASCII), whereas
 #x3008/#x3009 are rendered as full-width East-Asian
 characters ( / ).

The browsers I tested were not at all consistent.



On Sun, 1 Jul 2007, L. David Baron wrote:
 
 What's wrong with these mappings, and why shouldn't they also be the 
 mappings in HTML5?

On Sun, 1 Jul 2007, �istein E. Andersen wrote:
 
 The problem is that they are canonically equivalent to CJK characters.

On Sun, 1 Jul 2007, L. David Baron wrote:
 
 Makes sense.  I think I misread your original message.
 
 (Although changing them at all seems a little scary.)

Well, we'd changed them anyway (since before they mapped to non-canonical 
characters); changing them to something better seems at least partially 
sensible... Browsers are pretty poor on these two entities anyway.


On Fri, 6 Jul 2007, Simon Pieters wrote:
 On Fri, 22 Jun 2007 04:19:53 +0200, Ian Hickson [EMAIL PROTECTED] wrote:
   
 a ==
   
   Safari, Opera and Firefox drop the attribute. IE has an attribute 
   with the name being the empty string and the value being =. The 
   HTML5 parsing spec says that there should be an attribute with the 
   name = and the value the empty string. The Before attribute name 
   state part of the parsing spec might have to be revisited.
  
  I don't see any harm in leaving the spec as-is here, given the lack of 
  interoperability and the fact that there's no real reason to be using 
  attributes with this name anyway. Whatever's simplest to implement is 
  probably best here.
 
 Since it doesn't match any browser, and probably is an authoring mistake 
 (that would silently pass conformance checking in the case of embed), 
 could it be a parse error? (Also update the wording in the syntax 
 section if so.)

Done.


On Mon, 16 Jul 2007, Henri Sivonen wrote:
 
 In the Data State the spec says:
  U+0026 AMPERSAND ()
  When the content model flag is set to one of the PCDATA or RCDATA
  states: switch to the entity data state.
  Otherwise: treat it as per the anything else entry below.
 
 html5lib tests, WebKit trunk, Firefox 2.0.0.4 and