RE: Full Unicode based on UTF-16 proposal

Phillips, Addison Fri, 23 Mar 2012 09:47:10 -0700

Comments follow.

1. Definition of string. You say:

--
However,
    ECMAScript does not place any restrictions or requirements on the sequence
    of code units in a String value, so it may be ill-formed when interpreted
    as a UTF-16 code unit sequence.
--

I know what you mean, but others might not. Perhaps:

--
However, ECMAScript does not place any restrictions or requirements on the 
sequence of code units in a String value, so the sequence of code units may 
contain code units that are not valid in Unicode or sequences that do not 
represent Unicode code points (such as unpaired surrogates).
--

2. In this section, I would define string after code unit and code point. I 
would also include a definition of surrogates/surrogate pairs.

3. Under "text interpretation" you say:

--
For compatibility with existing applications, it
  has to allow surrogate code points (code points between U+D800 and U+DFFF 
which
  can never represent characters).
--

This would (see above) benefit from having a definition in place. As noted, 
this is slightly incomplete, since surrogate code units are used to form 
supplementary characters. Perhaps:

--
For compatibility with existing applications, it has to allow surrogate code 
points (code points between U+D800 and U+DFFF which do not individually 
represent characters).
--

4. 0xFFFE and 0xFFFF are non-characters in Unicode. I do think you do the right 
thing here. It's just a nit that you never note this ;-).

5. Editorial unnecessary ;-):

--
This transformation is rather ugly, but I’m afraid it’s the price ECMAScript
  has to pay for being 12 years late in supporting supplementary characters.
--

6. Under 'details' you suggest a number of renamings. Are these strictly 
necessary? The term 'character' could be taken to mean 'code point' instead, 
with an explanatory note.

7. Skipping down a lot, to "section 6 source text", you propose:

--
The text is expected to have been normalised
    to Unicode Normalization Form C (Canonical Decomposition, followed by 
Canonical
    Composition), as described in Unicode Standard Annex #15.
--

I think this should be removed or modified. Automatic application of NFC is not 
always desirable, as it can affect presentation or processing. Perhaps:

--
Normalization of the text to Unicode Normalization Form C (Canonical 
Decomposition, followed by Canonical Composition), as described in Unicode 
Standard Annex #15, is recommended when transcoding from another character 
encoding.
--

8. In "7.6 Identifier Names and Identifiers" you don't actually forbid unpaired 
surrogates or non-characters in the text (Identifier_Part:: does this by 
implication). Perhaps state it? Also, ZWJ and ZWNJ are permitted as the last 
character in an identifier.

9. "15.5.4.6": you say "(a nonnegative integer less than 0x10FFFF)", whereas it 
should say: "(a nonnegative integer less than or equal to 0x10FFFF)"

10. In the section on "what about utf-32", you say: " and the code points start 
at positions 1, 2, 3.". Of course this should be "... and the code points start 
at positions 0, 1, 2."

Thanks for this proposal!

Addison

> -----Original Message-----
> From: Norbert Lindenberg [mailto:[email protected]]
> Sent: Thursday, March 22, 2012 10:14 PM
> To: [email protected]
> Subject: Re: Full Unicode based on UTF-16 proposal
> 
> I've updated the proposal based on the feedback received so far. Changes are
> listed in the Updates section.
> http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/
> 
> Norbert
> 
> 
> On Mar 16, 2012, at 0:18 , Norbert Lindenberg wrote:
> 
> > Based on my prioritization of goals for support for full Unicode in 
> > ECMAScript
> [1], I've put together a proposal for supporting the full Unicode character 
> set
> based on the existing representation of text in ECMAScript using UTF-16 code
> unit sequences:
> > http://norbertlindenberg.com/2012/03/ecmascript-supplementary-
> characters/index.html
> >
> > The detailed proposed spec changes serve to get a good idea of the scope of
> the changes, but will need some polishing.
> >
> > Comments?
> >
> > Thanks,
> > Norbert
> >
> > [1] https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html
> >
> 

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

RE: Full Unicode based on UTF-16 proposal

Reply via email to