On 24 Oct 2013, at 16:02, Claude Pache <[email protected]> wrote:

> Therefore, I propose the following basic operations to operate on grapheme 
> clusters:

Out of curiosity, is there any programming language that operates on grapheme 
clusters (rather than code points) by default? FWIW, code point iteration is 
what I’d expect in any language.

>       text.graphemeAt(0) // get the first grapheme of the text
> 
>       // shorten a text to its first hundred graphemes
>       var shortenText = ''
>       let numGraphemes = 0
>       for (let grapheme of text) {
>               numGraphemes += 1
>               if (numGraphemes > 100) {
>                       shortenText += '…'
>                       break
>               }
>               shortenText += grapheme
>       }

So, you would want to change the string iterator’s behavior too?

> As a side note, I ask whether the `String.prototype.symbolAt 
> `/`String.prototype.at` as proposed in a recent thread, and the 
> `String.prototype[@@iterator]` as currently specified, are really what people 
> need, or if they would mistakenly use them with the intended meaning of 
> `String.prototype.graphemeAt` and `String.prototype.graphemes` as discussed 
> in the present message?

I don’t think this would be an issue. The new `String` methods and the iterator 
are well-defined and documented in terms of *code points*.

IMHO combining marks are easy enough to match and special-case in your code if 
that’s what you need. You could use a regular expression to iterate over all 
grapheme clusters in the string:

    // Based on the example on 
http://mathiasbynens.be/notes/javascript-unicode#accounting-for-other-combining-marks
    var regexGraphemeCluster = 
/([\0-\u02FF\u0370-\u1DBF\u1E00-\u20CF\u2100-\uD7FF\uDC00-\uFE1F\uFE30-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])([\u0300-\u036F\u
    1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]*)/g;
    
    var zalgo = 
'Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞';
    
    zalgo.match(regexGraphemeCluster);
    [
      "Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍",
      "A̴̵̜̰͔ͫ͗͢",
      "L̠ͨͧͩ͘",
      "G̴̻͈͍͔̹̑͗̎̅͛́",
      "Ǫ̵̹̻̝̳͂̌̌͘",
      "!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"
    ]
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss

Reply via email to