Comments follow.
1. Definition of string. You say:
--
However,
ECMAScript does not place any restrictions or requirements on the sequence
of code units in a String value, so it may be ill-formed when interpreted
as a UTF-16 code unit sequence.
--
I know what you mean, but others might not. Perhaps:
--
However, ECMAScript does not place any restrictions or requirements on the
sequence of code units in a String value, so the sequence of code units may
contain code units that are not valid in Unicode or sequences that do not
represent Unicode code points (such as unpaired surrogates).
--
2. In this section, I would define string after code unit and code point. I
would also include a definition of surrogates/surrogate pairs.
3. Under "text interpretation" you say:
--
For compatibility with existing applications, it
has to allow surrogate code points (code points between U+D800 and U+DFFF
which
can never represent characters).
--
This would (see above) benefit from having a definition in place. As noted,
this is slightly incomplete, since surrogate code units are used to form
supplementary characters. Perhaps:
--
For compatibility with existing applications, it has to allow surrogate code
points (code points between U+D800 and U+DFFF which do not individually
represent characters).
--
4. 0xFFFE and 0xFFFF are non-characters in Unicode. I do think you do the right
thing here. It's just a nit that you never note this ;-).
5. Editorial unnecessary ;-):
--
This transformation is rather ugly, but I’m afraid it’s the price ECMAScript
has to pay for being 12 years late in supporting supplementary characters.
--
6. Under 'details' you suggest a number of renamings. Are these strictly
necessary? The term 'character' could be taken to mean 'code point' instead,
with an explanatory note.
7. Skipping down a lot, to "section 6 source text", you propose:
--
The text is expected to have been normalised
to Unicode Normalization Form C (Canonical Decomposition, followed by
Canonical
Composition), as described in Unicode Standard Annex #15.
--
I think this should be removed or modified. Automatic application of NFC is not
always desirable, as it can affect presentation or processing. Perhaps:
--
Normalization of the text to Unicode Normalization Form C (Canonical
Decomposition, followed by Canonical Composition), as described in Unicode
Standard Annex #15, is recommended when transcoding from another character
encoding.
--
8. In "7.6 Identifier Names and Identifiers" you don't actually forbid unpaired
surrogates or non-characters in the text (Identifier_Part:: does this by
implication). Perhaps state it? Also, ZWJ and ZWNJ are permitted as the last
character in an identifier.
9. "15.5.4.6": you say "(a nonnegative integer less than 0x10FFFF)", whereas it
should say: "(a nonnegative integer less than or equal to 0x10FFFF)"
10. In the section on "what about utf-32", you say: " and the code points start
at positions 1, 2, 3.". Of course this should be "... and the code points start
at positions 0, 1, 2."
Thanks for this proposal!
Addison
> -----Original Message-----
> From: Norbert Lindenberg [mailto:[email protected]]
> Sent: Thursday, March 22, 2012 10:14 PM
> To: [email protected]
> Subject: Re: Full Unicode based on UTF-16 proposal
>
> I've updated the proposal based on the feedback received so far. Changes are
> listed in the Updates section.
> http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/
>
> Norbert
>
>
> On Mar 16, 2012, at 0:18 , Norbert Lindenberg wrote:
>
> > Based on my prioritization of goals for support for full Unicode in
> > ECMAScript
> [1], I've put together a proposal for supporting the full Unicode character
> set
> based on the existing representation of text in ECMAScript using UTF-16 code
> unit sequences:
> > http://norbertlindenberg.com/2012/03/ecmascript-supplementary-
> characters/index.html
> >
> > The detailed proposed spec changes serve to get a good idea of the scope of
> the changes, but will need some polishing.
> >
> > Comments?
> >
> > Thanks,
> > Norbert
> >
> > [1] https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html
> >
>
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss