On 24 Oct 2013, at 16:02, Claude Pache <[email protected]> wrote:
> Therefore, I propose the following basic operations to operate on grapheme
> clusters:
Out of curiosity, is there any programming language that operates on grapheme
clusters (rather than code points) by default? FWIW, code point iteration is
what I’d expect in any language.
> text.graphemeAt(0) // get the first grapheme of the text
>
> // shorten a text to its first hundred graphemes
> var shortenText = ''
> let numGraphemes = 0
> for (let grapheme of text) {
> numGraphemes += 1
> if (numGraphemes > 100) {
> shortenText += '…'
> break
> }
> shortenText += grapheme
> }
So, you would want to change the string iterator’s behavior too?
> As a side note, I ask whether the `String.prototype.symbolAt
> `/`String.prototype.at` as proposed in a recent thread, and the
> `String.prototype[@@iterator]` as currently specified, are really what people
> need, or if they would mistakenly use them with the intended meaning of
> `String.prototype.graphemeAt` and `String.prototype.graphemes` as discussed
> in the present message?
I don’t think this would be an issue. The new `String` methods and the iterator
are well-defined and documented in terms of *code points*.
IMHO combining marks are easy enough to match and special-case in your code if
that’s what you need. You could use a regular expression to iterate over all
grapheme clusters in the string:
// Based on the example on
http://mathiasbynens.be/notes/javascript-unicode#accounting-for-other-combining-marks
var regexGraphemeCluster =
/([\0-\u02FF\u0370-\u1DBF\u1E00-\u20CF\u2100-\uD7FF\uDC00-\uFE1F\uFE30-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]|[\uD800-\uDBFF])([\u0300-\u036F\u
1DC0-\u1DFF\u20D0-\u20FF\uFE20-\uFE2F]*)/g;
var zalgo =
'Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍A̴̵̜̰͔ͫ͗͢L̠ͨͧͩ͘G̴̻͈͍͔̹̑͗̎̅͛́Ǫ̵̹̻̝̳͂̌̌͘!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞';
zalgo.match(regexGraphemeCluster);
[
"Z͑ͫ̓ͪ̂ͫ̽͏̴̙̤̞͉͚̯̞̠͍",
"A̴̵̜̰͔ͫ͗͢",
"L̠ͨͧͩ͘",
"G̴̻͈͍͔̹̑͗̎̅͛́",
"Ǫ̵̹̻̝̳͂̌̌͘",
"!͖̬̰̙̗̿̋ͥͥ̂ͣ̐́́͜͞"
]
_______________________________________________
es-discuss mailing list
[email protected]
https://mail.mozilla.org/listinfo/es-discuss