Re: Working with grapheme clusters

Jason Orendorff Fri, 25 Oct 2013 18:36:32 -0700

On Thu, Oct 24, 2013 at 7:38 AM, Anne van Kesteren <[email protected]> wrote:
> On Thu, Oct 24, 2013 at 3:31 PM, Mathias Bynens <[email protected]> wrote:
>> Imagine you’re writing a JavaScript library that escapes a given string as 
>> an HTML character reference, or as a CSS identifier, or anything else. In 
>> those cases, you don’t care about grapheme clusters, you care about code 
>> points, cause those are the units you end up escaping individually.
>
> Is that really a common operation? I would expect formatting,
> searching, etc. to dominate. E.g. whenever you do substr/substring you
> would want that to be grapheme-cluster aware.


I think I disagree. Trying to take this apart:

If you're searching, you don't want to use the iterator anyway,
because finding character boundaries or grapheme boundaries is a waste
of time. UTF-16 is designed so that you can search based on code units
alone, without computing boundaries. RegExp searches fall in this
category.

IIUC, "formatting" mostly involves finding patterns to replace—it's a
special case of searching, right?

When you do substr/slice/substring, you should be using offsets that
are on grapheme boundaries, but obtaining offsets by using String
iteration and adding up the lengths will be very rare, I think.

So String iteration is kind of left looking around for a use case. I
can't think of any that compel me to prefer graphemes over characters
out of sheer practicality. Reversing strings, for example, I can't
care about that. Anyone?

-j
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Re: Working with grapheme clusters

Reply via email to