Re: Full Unicode based on UTF-16 proposal

Mark Davis ☕ Fri, 16 Mar 2012 16:23:32 -0700

Whew, a lot of work, Norbert. Looks quite good. My one question is whether
it is worth having a mechanism for iteration.


OLD CODE
for (int i = 0; i < s.length(); ++) {
  var x = s.charAt(i);
  // do something with x
}

Using your mechanism, one would write:

NEW CODE
for (int i = 0; i < s.length(); ++) {
  var x = s.codePointAt(i);
  // do something with x
  if (x > 0xFFFF) {
    ++i;
  }
}

In Java, for example, I *really* wish you could write:

DESIRED

for (int codepoint : s) {
  // do something with x
}

However, maybe this kind of iteration is rare enough in ES that it suffices
to document the pattern under NEW CODE.

Thanks for all your work!


> proposal for upgrading ECMAScript to a Unicode version released in this
century

This was amusing; could have said "this millennium" ;-)
------------------------------
Mark <https://plus.google.com/114199149796022210033>
*
*
*— Il meglio è l’inimico del bene —*
**



On Fri, Mar 16, 2012 at 01:55, Erik Corry <[email protected]> wrote:

> This is very useful, and was surely a lot of work.  I like the general
> thrust of it a lot.  It has a high level of backwards compatibility,
> does not rely on the VM having two different string implementations in
> it, and it seems to fix the issues people are encountering.
>
> However I think we probably do want the /u modifier on regexps to
> control the new backward-incompatible behaviour.  There may be some
> way to relax this for regexp literals in opted in Harmony code, but
> for new RegExp(...) and for other string literals I think there are
> rather too many inconsistencies with the old behaviour.
>
> The algorithm given for codePointAt never returns NaN.  It should
> probably do that for indices that hit a trail surrogate that has a
> lead surrogate preceeding it.
>
> Perhaps it is outside the scope of this proposal, but it would also
> make a lot of sense to add some named character classes to RegExp.
>
> If we are makig a /u modifier for RegExp it would also be nice to get
> rid of the incorrect case independent matching rules.  This is the
> section that says: "If ch's code unit value is greater than or equal
> to decimal 128 and cu's code unit value is less than decimal  128,
> then return ch."
>
> 2012/3/16 Norbert Lindenberg <[email protected]>:
> > Based on my prioritization of goals for support for full Unicode in
> ECMAScript [1], I've put together a proposal for supporting the full
> Unicode character set based on the existing representation of text in
> ECMAScript using UTF-16 code unit sequences:
> >
> http://norbertlindenberg.com/2012/03/ecmascript-supplementary-characters/index.html
> >
> > The detailed proposed spec changes serve to get a good idea of the scope
> of the changes, but will need some polishing.
> >
> > Comments?
> >
> > Thanks,
> > Norbert
> >
> > [1]
> https://mail.mozilla.org/pipermail/es-discuss/2012-February/020721.html
> >
> > _______________________________________________
> > es-discuss mailing list
> > [email protected]
> > https://mail.mozilla.org/listinfo/es-discuss
> _______________________________________________
> es-discuss mailing list
> [email protected]
> https://mail.mozilla.org/listinfo/es-discuss
>

_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Full Unicode based on UTF-16 proposal

Reply via email to