Re: Full Unicode based on UTF-16 proposal

Erik Corry Fri, 16 Mar 2012 19:57:08 -0700

2012/3/17 Norbert Lindenberg <[email protected]>:
> Thanks for your comments - a few replies below.
>
> Norbert
>
>
> On Mar 16, 2012, at 1:55 , Erik Corry wrote:
>
>> However I think we probably do want the /u modifier on regexps to
>> control the new backward-incompatible behaviour.  There may be some
>> way to relax this for regexp literals in opted in Harmony code, but
>> for new RegExp(...) and for other string literals I think there are
>> rather too many inconsistencies with the old behaviour.
>
> Before asking developers to add /u, we should really have some evidence that 
> not doing so would cause actual compatibility issues for real applications. 
> Do you know of any examples?


No.  In general I don't think it is realistic to try to prove that
problematic code does not exist, since that requires quantifying over
all existing JS code, which is clearly impossible.

> Good point about Harmony code, although it seems opt-in got replaced by being 
> part of a module.

That would work too, I think.

>> The algorithm given for codePointAt never returns NaN.  It should
>> probably do that for indices that hit a trail surrogate that has a
>> lead surrogate preceeding it.
>
> NaN is not a valid code point, so it shouldn't be returned. If we want to 
> indicate access to a trailing surrogate code unit as an error, we should 
> throw an exception.

Then you should probably remove the text: "If there is no code unit at
that position, the result is NaN" from your proposal :-)

I am wary of using exceptions for non-exceptional data-driven events,
since performance is usually terrible and it's arguably an abuse of
the mechanism.  Your iterator code looks fine to me an needs neither
NaN or exceptions.

>> Perhaps it is outside the scope of this proposal, but it would also
>> make a lot of sense to add some named character classes to RegExp.
>
> It would make a lot of sense, but is outside the scope of this proposal. One 
> step at a time :-)

I can see that.  But if we are going to have multiple versions of the
RegExp syntax we should probably aim to keep the number down.

>> If we are makig a /u modifier for RegExp it would also be nice to get
>> rid of the incorrect case independent matching rules.  This is the
>> section that says: "If ch's code unit value is greater than or equal
>> to decimal 128 and cu's code unit value is less than decimal  128,
>> then return ch."
>
> And the exception for "ß" and other characters whose upper case equivalent 
> has more than one code point ("If u does not consist of a single character, 
> return ch." in the Canonicalize algorithm in ES 5.1).

Yes.


>> 2012/3/16 Norbert Lindenberg <[email protected]>:
>>> Based on my prioritization of goals for support for full Unicode in 
>>> ECMAScript [1], I've put together a proposal for supporting the full 
>>> Unicode character set based on the existing representation of text in 
>>> ECMAScript using UTF-16 code unit sequences:
>>> http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/index.html
>>>
>>> The detailed proposed spec changes serve to get a good idea of the scope of 
>>> the changes, but will need some polishing.
>>>
>>> Comments?
>>>
>>> Thanks,
>>> Norbert
>>>
>>> [1] https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html
>>>
>>> _______________________________________________
>>> es-discuss mailing list
>>> [email protected]
>>> https://mail.mozilla.org/listinfo/es-discuss
>
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to