On Tue, 12 May 2026, youkidearitai wrote: > 2022年12月16日(金) 0:34 Derick Rethans <[email protected]>: > > > I have just published an initial draft of the "Unicode Text > > Processing" RFC, a proposal to have performant unicode text > > processing always available to PHP users, by introducing a new > > "Text" class. > > > > You can find it at: > > https://wiki.php.net/rfc/unicode_text_processing > > > > I'm looking forwards to hearing your opinions, additions, and > > suggestions — the RFC specifically asks for these in places. > > Is still available this topic? > I have interesting this Text class. > I'm glad to control based on grapheme cluster such as Swift's string type.
I still have interest in working this out into supporting even more things. Since I wrote that Draft RFC, I did add a few more features: https://github.com/derickr/php-text/commits/main/ > > I have some idea. > > 1. Move to Intl extension such as \Intl\Text > * I think keep it simple for implementation. I don't agree with this, as although it builds on top of ICU like the classes in the Intl extension, it isn't following ICU's API style at all. It is meant to be a much more opiniated API that does the simple 80% case well. > 2. Add Text type for grapheme_* function only such as string|Text. > * It is some complexy for implementation but userland is simple I am not too sure about this. The grapheme_* functions closely match ICUs internal, and powerful, API. If you want them to accept a Test object too, that means these grapheme_* functions' signature needs to be overloaded. for example: grapheme_strstr(string $haystack, string $needle, bool $beforeNeedle = false, string $locale = "" ): string|false would need to change into: grapheme_strstr(string|Text $haystack, string|Text $needle, bool $beforeNeedle = false, string $locale = "" ): string|false And then '$locale' makes no sense, as this is already part of each of the Text objects themselves. Instead, the 'contains' method on the Text object already does something very similar: https://github.com/derickr/php-text/blob/main/tests/text-contains.phpt I think the grapheme functions should stay as they are, and additional methods can be added on the Text class, where there is currently functionality missing that the grapheme_* functions already support. The RFC document also already lists more functions than I have implemented so far too. > 3. If UTF-8 validaion failed, throws an exception It already does that, see this test case: https://github.com/derickr/php-text/blob/main/tests/text-in-out-basic.phpt#L13 — although the exception message itself could be improved. > __toString method returns string type is seems good. > Please consider this. This is already implemented too: https://github.com/derickr/php-text/blob/main/text.c#L323 cheers, Derick -- https://derickrethans.nl | https://xdebug.org | https://dram.io Author of Xdebug. Like it? Consider supporting me: https://xdebug.org/support mastodon: @[email protected] @[email protected]
