Hi Yuya,

I think this is a good idea. While spec compliance is generally desirable,
DoS via unbounded grapheme clusters is a real threat, and it's reasonable
for a language-level implementation to impose practical limits that the
Unicode spec itself doesn't define. This kind of gap between a
general-purpose spec and a concrete implementation is not unusual.

The default of 32 code points sounds sensible given that natural language
grapheme clusters top out well below that.

One minor note: it might help to clarify the intended behavior of
`grapheme_limit_codepoints` a bit more — for instance, whether it is meant
as a validation check (returning false when a cluster exceeds the limit) or
something else.

Regards,
Kentaro Takeda


2026年2月23日(月) 20:28 youkidearitai <[email protected]>:

> Hi, Internals
>
> I noticed grapheme cluster is not limit code points in UAX#29.
> https://www.unicode.org/reports/tr29/
>
> And there is no limit code point in Unicode that confirmed in issue of ICU.
> https://unicode-org.atlassian.net/browse/ICU-23302
>
> So that means create many code points in 1 grapheme cluster,
> That is crash for program because computer resource is limited.
>
> For example, this code is 200MB but 1 grapheme cluster in emoji_bomb.txt
> ```
> php -r 'echo(mb_trim(str_repeat("\u{200d}\u{1f468}\u{200d}\u{1f466}\u
> {200d}\u{1f466}", 10000000), "\u{200d}"));' -d memory_limit=600M >
> emoji_bomb.txt
> ```
> (PLEASE BE CAREFUL OPEN IN emoji_bomb.txt BECAUSE MAYBE CRASH)
>
> So, I think we(php-src, programming language level) need to create new
> custom limit function.
> My idea is below:
>
> ```
> grapheme_limit_codepoints(string $str, integer $max_codepoints = 32): bool
> ```
>
> I don't have heavy opinion that $max_codepoints is 32.
> However, 32 code points is enough of grapheme cluster because
> human language max code points is maybe Hakṣhmalawarayaṁ(ཧ) in
> 9 code points.
>
> If need more than code points in grapheme cluster,
> Userland can to increase $max_codepoints.
>
> Please see also my speakerdeck.
>
> https://speakerdeck.com/youkidearitai/limit-of-code-point-for-grapheme-cluster
>
> What do you think about this idea?
>
> Regards
> Yuya
>
> --
> ---------------------------
> Yuya Hamada (tekimen)
> - https://tekitoh-memdhoi.info
> - https://github.com/youkidearitai
> -----------------------------
>

Reply via email to