On 26 March 2024 17:04:18 GMT, Casper Langemeijer <langemei...@php.net> wrote: >I'd like to address an issue I have with this RFC.
Please don't top reply. >I'm not sure is solves a problem by itself. If I understand all of this >correctly this only does what already can be accomplished with >preg_match_all('/\X/u', ...). The result of this method in my opinion is not >very usefull by itself. I've done some searching on various code platforms >where I mostly find the use-case for counting the number of grapheme's. I've >used it to implement strrev() that correctly works multibyte. > >I'm very sad that mbstring works on codepoints instead of grapheme's and I >would very much like to see something happening in that area, but I think >expanding a simple string to an array of as many elements to give developers a >tool to do this in PHP-space is not good enough. Especially since it can >already be achieved with a regexp that already works. > >In my opinion: This adds nothing, and tells the PHP developer that is ok to do >count(grapheme_str_split()) for a more accurate mb_strlen(). > >I would like to see a family of functions that can do multibyte str_split(), >strrev(), substr(). Ideally as bugfix in mb_* functions, because the edge-case >of wanting to know the length in codepoints of a string is a weird edge-case. >No developer wants to know that. mb_strlen() should have returned the number >of graphemes from the start. Many of these already exist, such as grapheme_substr. We can't simply change the behaviour of the already existing functions due to BC reasons. The intl extension is also built on ICU, an actual unicode text processing library. The grapheme_str_split function, as well as other intl extension functions is what should replace mbstring really. cheers Derick