Am 03.03.20 um 14:29 schrieb Nicolas Grekas: > Le mar. 3 mars 2020 à 11:04, Rowan Tommins <rowan.coll...@gmail.com> a > écrit : > >> On Tue, 3 Mar 2020 at 08:46, Andreas Heigl <andr...@heigl.org> wrote: >> >>> >>> While it is mainly aimed at being a mere convenience-function that could >>> also be easily implemented in userland it misses one main thing IMO when >>> handling unicode-strings: Normalization. >>> >>> >> >> While I would love to see more functionality for handling Unicode which >> didn't treat it as just another character set, I don't think sprinkling it >> into the main string functions of the language would be the right approach. >> Even if we changed all the existing functions to be "Unicode-aware", as was >> planned for PHP 6, the resulting API would not handle all cases correctly. >> >> In this case, a Unicode-based string API ought to provide at least two >> variants of "contains", as options or separate functions: >> >> - a version which matches on code point, for answering queries like "does >> this string contain right-to-left override characters?" >> - at least one form of normalization, but probably several >> >> If there was serious work on a new string API in progress, a freeze on >> additions to the current API would make sense; but right now, the >> byte-based string API is what we have, and I think this function is a >> sensible addition to it. >> > > > FYI, I wrote a String handling lib, shipped as Symfony String: > - doc: https://symfony.com/doc/current/components/string.html > - src: https://github.com/symfony/string > > TL;DR, it provides 3 classes of value objects, dealing with bytes, code > points and grapheme cluster (~= normalized unicode) > > It makes no sense to have `str_contains()` or any global function able to > deal with Unicode normalization *unless* the PHP string values embed their > unit system (one of: bytes, codepoints or graphemes). > > With this rationale, I agree with Rowan: PHP's native string functions deal > with bytes. So should str_contains(). Other unit systems can be implemented > in userland (until PHP implements something similar to Symfony String in > core - but that's another topic.)
str_contains as it currently is implemented can also easily be implemented in userland. That was my reasoning. I would think otherwise would it take unicode into account as that's much harder to implement in userland. And I didn'T want to start a new discussion, I merely wanted to explain the reasoning behind my decission. Cheers Andreas -- ,,, (o o) +---------------------------------------------------------ooO-(_)-Ooo-+ | Andreas Heigl | | mailto:andr...@heigl.org N 50°22'59.5" E 08°23'58" | | http://andreas.heigl.org http://hei.gl/wiFKy7 | +---------------------------------------------------------------------+ | http://hei.gl/root-ca | +---------------------------------------------------------------------+
signature.asc
Description: OpenPGP digital signature