2026年2月28日(土) 0:59 youkidearitai <[email protected]>: > > 2026年2月24日(火) 16:21 youkidearitai <[email protected]>: > > > > 2026年2月24日(火) 11:38 Kentaro Takeda <[email protected]>: > > > > > > Hi Yuya, > > > > > > I think this is a good idea. While spec compliance is generally > > > desirable, DoS via unbounded grapheme clusters is a real threat, and it's > > > reasonable for a language-level implementation to impose practical limits > > > that the Unicode spec itself doesn't define. This kind of gap between a > > > general-purpose spec and a concrete implementation is not unusual. > > > > > > The default of 32 code points sounds sensible given that natural language > > > grapheme clusters top out well below that. > > > > > > One minor note: it might help to clarify the intended behavior of > > > `grapheme_limit_codepoints` a bit more — for instance, whether it is > > > meant as a validation check (returning false when a cluster exceeds the > > > limit) or something else. > > > > > > Regards, > > > Kentaro Takeda > > > > > > > > > 2026年2月23日(月) 20:28 youkidearitai <[email protected]>: > > >> > > >> Hi, Internals > > >> > > >> I noticed grapheme cluster is not limit code points in UAX#29. > > >> https://www.unicode.org/reports/tr29/ > > >> > > >> And there is no limit code point in Unicode that confirmed in issue of > > >> ICU. > > >> https://unicode-org.atlassian.net/browse/ICU-23302 > > >> > > >> So that means create many code points in 1 grapheme cluster, > > >> That is crash for program because computer resource is limited. > > >> > > >> For example, this code is 200MB but 1 grapheme cluster in emoji_bomb.txt > > >> ``` > > >> php -r 'echo(mb_trim(str_repeat("\u{200d}\u{1f468}\u{200d}\u{1f466}\u > > >> {200d}\u{1f466}", 10000000), "\u{200d}"));' -d memory_limit=600M > > > >> emoji_bomb.txt > > >> ``` > > >> (PLEASE BE CAREFUL OPEN IN emoji_bomb.txt BECAUSE MAYBE CRASH) > > >> > > >> So, I think we(php-src, programming language level) need to create new > > >> custom limit function. > > >> My idea is below: > > >> > > >> ``` > > >> grapheme_limit_codepoints(string $str, integer $max_codepoints = 32): > > >> bool > > >> ``` > > >> > > >> I don't have heavy opinion that $max_codepoints is 32. > > >> However, 32 code points is enough of grapheme cluster because > > >> human language max code points is maybe Hakṣhmalawarayaṁ(ཧ) in > > >> 9 code points. > > >> > > >> If need more than code points in grapheme cluster, > > >> Userland can to increase $max_codepoints. > > >> > > >> Please see also my speakerdeck. > > >> https://speakerdeck.com/youkidearitai/limit-of-code-point-for-grapheme-cluster > > >> > > >> What do you think about this idea? > > >> > > >> Regards > > >> Yuya > > >> > > >> -- > > >> --------------------------- > > >> Yuya Hamada (tekimen) > > >> - https://tekitoh-memdhoi.info > > >> - https://github.com/youkidearitai > > >> ----------------------------- > > > > Hi, Kentaro > > > > Thank you very much for your feedback. > > > > > One minor note: it might help to clarify the intended behavior of > > > `grapheme_limit_codepoints` a bit more — for instance, whether it is > > > meant as a validation check (returning false when a cluster exceeds the > > > limit) or something else. > > > > Okay. I'll show you. > > > > ``` > > // something string in $_POST['text'] > > // Validate many code points in a grapheme cluster. > > if (grapheme_limit_codepoints($_POST['text'], 32) !== true) { > > throw new InvalidException("Found invalid / many code points in > > grapheme cluster"); > > } > > > > // Validate grapheme cluster length > > if (grapheme_strlen($_POST['text']) > 100) { > > throw new InvalidException("Invalid grater than 100 graphemes"); > > } > > > > // do anything... > > ``` > > The intention is "count correct graphemes avoid DoS". > > And I want to overcoming to > > https://github.com/symfony/symfony/pull/13527 in grapheme_strlen > > function. > > > > Feel free to more comment. > > Regards > > Yuya. > > > > -- > > --------------------------- > > Yuya Hamada (tekimen) > > - https://tekitoh-memdhoi.info > > - https://github.com/youkidearitai > > ----------------------------- > > Hi, Internals > > I created a PoC and RFC. > https://github.com/php/php-src/pull/21311 > https://wiki.php.net/rfc/grapheme_limit_codepoints > > I tried to ask Unicode that UAX#29 add for limit of codepoint for > grapheme cluster. > Perhaps Unicode adds my suggestion if it is make sense. However, I > don't know what happen. > > Anyway, I think make sense that grapheme cluster limits codepoint in PHP side. > > Feel free to comment. > > Regards > Yuya > > -- > --------------------------- > Yuya Hamada (tekimen) > - https://tekitoh-memdhoi.info > - https://github.com/youkidearitai > -----------------------------
Hi, Internals This topic, I reported Unicode. Then received reply that is below: > Thank you for your feedback and your interest in Unicode. > Your feedback will be reviewed by one of Unicode’s working groups. > If appropriate, it may be posted to the PRI feedback page or be made part of > a list of general feedback that will be considered for the next quarterly UTC > meeting. My understand, if appropriate PRI(https://www.unicode.org/review/) or UTC. I'm going to wait and see. Regards Yuya -- --------------------------- Yuya Hamada (tekimen) - https://tekitoh-memdhoi.info - https://github.com/youkidearitai -----------------------------
