Re: Working with grapheme clusters

Norbert Lindenberg Sat, 26 Oct 2013 20:13:24 -0700

On Oct 26, 2013, at 6:58 , Jason Orendorff <[email protected]> wrote:


> On Fri, Oct 25, 2013 at 11:42 PM, Norbert Lindenberg
> <[email protected]> wrote:
>> 
>> On Oct 25, 2013, at 18:35 , Jason Orendorff <[email protected]> 
>> wrote:
>> 
>>> UTF-16 is designed so that you can search based on code units
>>> alone, without computing boundaries. RegExp searches fall in this
>>> category.
>> 
>> Not if the RegExp is case insensitive, or uses a character class, or ".", or 
>> a quantifier - these all require looking at code points rather than UTF-16 
>> code units in order to support the full Unicode character set.

> I'd like to know what you have in mind regarding quantifiers though.

When I write /💩{2}/, I mean /💩💩/, but the current code unit based RegExp will 
interpret it as /💩\uDCA9/, which can't match any well-formed UTF-16 string.

Norbert
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Working with grapheme clusters

Reply via email to