Am 03.03.20 um 14:29 schrieb Nicolas Grekas:
> Le mar. 3 mars 2020 à 11:04, Rowan Tommins <rowan.coll...@gmail.com> a
> écrit :
> 
>> On Tue, 3 Mar 2020 at 08:46, Andreas Heigl <andr...@heigl.org> wrote:
>>
>>>
>>> While it is mainly aimed at being a mere convenience-function that could
>>> also be easily implemented in userland it misses one main thing IMO when
>>> handling unicode-strings: Normalization.
>>>
>>>
>>
>> While I would love to see more functionality for handling Unicode which
>> didn't treat it as just another character set, I don't think sprinkling it
>> into the main string functions of the language would be the right approach.
>> Even if we changed all the existing functions to be "Unicode-aware", as was
>> planned for PHP 6, the resulting API would not handle all cases correctly.
>>
>> In this case, a Unicode-based string API ought to provide at least two
>> variants of "contains", as options or separate functions:
>>
>> - a version which matches on code point, for answering queries like "does
>> this string contain right-to-left override characters?"
>> - at least one form of normalization, but probably several
>>
>> If there was serious work on a new string API in progress, a freeze on
>> additions to the current API would make sense; but right now, the
>> byte-based string API is what we have, and I think this function is a
>> sensible addition to it.
>>
> 
> 
> FYI, I wrote a String handling lib, shipped as Symfony String:
> - doc: https://symfony.com/doc/current/components/string.html
> - src: https://github.com/symfony/string
> 
> TL;DR, it provides 3 classes of value objects, dealing with bytes, code
> points and grapheme cluster (~= normalized unicode)
> 
> It makes no sense to have `str_contains()` or any global function able to
> deal with Unicode normalization *unless* the PHP string values embed their
> unit system (one of: bytes, codepoints or graphemes).
> 
> With this rationale, I agree with Rowan: PHP's native string functions deal
> with bytes. So should str_contains(). Other unit systems can be implemented
> in userland (until PHP implements something similar to Symfony String in
> core - but that's another topic.)

str_contains as it currently is implemented can also easily be
implemented in userland. That was my reasoning. I would think otherwise
would it take unicode into account as that's much harder to implement in
userland.

And I didn'T want to start a new discussion, I merely wanted to explain
the reasoning behind my decission.

Cheers

Andreas
-- 
                                                              ,,,
                                                             (o o)
+---------------------------------------------------------ooO-(_)-Ooo-+
| Andreas Heigl                                                       |
| mailto:andr...@heigl.org                  N 50°22'59.5" E 08°23'58" |
| http://andreas.heigl.org                       http://hei.gl/wiFKy7 |
+---------------------------------------------------------------------+
| http://hei.gl/root-ca                                               |
+---------------------------------------------------------------------+

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to