2026年2月28日(土) 0:59 youkidearitai <[email protected]>:
>
> 2026年2月24日(火) 16:21 youkidearitai <[email protected]>:
> >
> > 2026年2月24日(火) 11:38 Kentaro Takeda <[email protected]>:
> > >
> > > Hi Yuya,
> > >
> > > I think this is a good idea. While spec compliance is generally 
> > > desirable, DoS via unbounded grapheme clusters is a real threat, and it's 
> > > reasonable for a language-level implementation to impose practical limits 
> > > that the Unicode spec itself doesn't define. This kind of gap between a 
> > > general-purpose spec and a concrete implementation is not unusual.
> > >
> > > The default of 32 code points sounds sensible given that natural language 
> > > grapheme clusters top out well below that.
> > >
> > > One minor note: it might help to clarify the intended behavior of 
> > > `grapheme_limit_codepoints` a bit more — for instance, whether it is 
> > > meant as a validation check (returning false when a cluster exceeds the 
> > > limit) or something else.
> > >
> > > Regards,
> > > Kentaro Takeda
> > >
> > >
> > > 2026年2月23日(月) 20:28 youkidearitai <[email protected]>:
> > >>
> > >> Hi, Internals
> > >>
> > >> I noticed grapheme cluster is not limit code points in UAX#29.
> > >> https://www.unicode.org/reports/tr29/
> > >>
> > >> And there is no limit code point in Unicode that confirmed in issue of 
> > >> ICU.
> > >> https://unicode-org.atlassian.net/browse/ICU-23302
> > >>
> > >> So that means create many code points in 1 grapheme cluster,
> > >> That is crash for program because computer resource is limited.
> > >>
> > >> For example, this code is 200MB but 1 grapheme cluster in emoji_bomb.txt
> > >> ```
> > >> php -r 'echo(mb_trim(str_repeat("\u{200d}\u{1f468}\u{200d}\u{1f466}\u
> > >> {200d}\u{1f466}", 10000000), "\u{200d}"));' -d memory_limit=600M >
> > >> emoji_bomb.txt
> > >> ```
> > >> (PLEASE BE CAREFUL OPEN IN emoji_bomb.txt BECAUSE MAYBE CRASH)
> > >>
> > >> So, I think we(php-src, programming language level) need to create new
> > >> custom limit function.
> > >> My idea is below:
> > >>
> > >> ```
> > >> grapheme_limit_codepoints(string $str, integer $max_codepoints = 32): 
> > >> bool
> > >> ```
> > >>
> > >> I don't have heavy opinion that $max_codepoints is 32.
> > >> However, 32 code points is enough of grapheme cluster because
> > >> human language max code points is maybe Hakṣhmalawarayaṁ(ཧ) in
> > >> 9 code points.
> > >>
> > >> If need more than code points in grapheme cluster,
> > >> Userland can to increase $max_codepoints.
> > >>
> > >> Please see also my speakerdeck.
> > >> https://speakerdeck.com/youkidearitai/limit-of-code-point-for-grapheme-cluster
> > >>
> > >> What do you think about this idea?
> > >>
> > >> Regards
> > >> Yuya
> > >>
> > >> --
> > >> ---------------------------
> > >> Yuya Hamada (tekimen)
> > >> - https://tekitoh-memdhoi.info
> > >> - https://github.com/youkidearitai
> > >> -----------------------------
> >
> > Hi, Kentaro
> >
> > Thank you very much for your feedback.
> >
> > > One minor note: it might help to clarify the intended behavior of 
> > > `grapheme_limit_codepoints` a bit more — for instance, whether it is 
> > > meant as a validation check (returning false when a cluster exceeds the 
> > > limit) or something else.
> >
> > Okay. I'll show you.
> >
> > ```
> > // something string in $_POST['text']
> > // Validate many code points in a grapheme cluster.
> > if (grapheme_limit_codepoints($_POST['text'], 32) !== true) {
> >    throw new InvalidException("Found invalid / many code points in
> > grapheme cluster");
> > }
> >
> > // Validate grapheme cluster length
> > if (grapheme_strlen($_POST['text']) > 100) {
> >   throw new InvalidException("Invalid grater than 100 graphemes");
> > }
> >
> > // do anything...
> > ```
> > The intention is "count correct graphemes avoid DoS".
> > And I want to overcoming to
> > https://github.com/symfony/symfony/pull/13527 in grapheme_strlen
> > function.
> >
> > Feel free to more comment.
> > Regards
> > Yuya.
> >
> > --
> > ---------------------------
> > Yuya Hamada (tekimen)
> > - https://tekitoh-memdhoi.info
> > - https://github.com/youkidearitai
> > -----------------------------
>
> Hi, Internals
>
> I created a PoC and RFC.
> https://github.com/php/php-src/pull/21311
> https://wiki.php.net/rfc/grapheme_limit_codepoints
>
> I tried to ask Unicode that UAX#29 add for limit of codepoint for
> grapheme cluster.
> Perhaps Unicode adds my suggestion if it is make sense. However, I
> don't know what happen.
>
> Anyway, I think make sense that grapheme cluster limits codepoint in PHP side.
>
> Feel free to comment.
>
> Regards
> Yuya
>
> --
> ---------------------------
> Yuya Hamada (tekimen)
> - https://tekitoh-memdhoi.info
> - https://github.com/youkidearitai
> -----------------------------

Hi, Internals

This topic, I reported Unicode. Then received reply that is below:

> Thank you for your feedback and your interest in Unicode.
> Your feedback will be reviewed by one of Unicode’s working groups.
> If appropriate, it may be posted to the PRI feedback page or be made part of 
> a list of general feedback that will be considered for the next quarterly UTC 
> meeting.

My understand, if appropriate PRI(https://www.unicode.org/review/) or UTC.

 I'm going to wait and see.

Regards
Yuya

-- 
---------------------------
Yuya Hamada (tekimen)
- https://tekitoh-memdhoi.info
- https://github.com/youkidearitai
-----------------------------

Reply via email to