Hey all. Just a short note why I voted against the current implementation of the str_contains functionality.
While it is mainly aimed at being a mere convenience-function that could also be easily implemented in userland it misses one main thing IMO when handling unicode-strings: Normalization. It is correct, that the binary representation of the string "äöüß" within the string "Täöüßtstring" seems to be the same and that a simple `strpos('Täöüßtstring', 'äöüß')` results in a not-false result. But using unicode it might be that the two strings are using different normalizations. So for the human eye the two strings look (almost) identical but internaly they are completely different (and even mb_strpos might not be able to detect the similarity). See https://3v4l.org/fasO4 for more information. As we are creating new functionality here it would have been great to solve this issue. But as it is IMO merely a convenience add on that can easily be implemented in userland I vote against it. Cheers Andreas Am 17.02.20 um 15:23 schrieb Rowan Tommins: > On Mon, 17 Feb 2020 at 13:38, Pierre Joye <pierre....@gmail.com> wrote: > >> >> Btw, while some mbstring references I I mentioned, I do like the ICU search >> implementation as well. >> >> http://userguide.icu-project.org/collation/icu-string-search-service >> >> It handles a lot of cases based on locales. >> > > > That's a lovely example of why treating Unicode as a character encoding is > the wrong mindset. > > I would love to see more people using ext/intl rather than ext/mbstring, > and more ICU features like this being included. > > Regards, > -- ,,, (o o) +---------------------------------------------------------ooO-(_)-Ooo-+ | Andreas Heigl | | mailto:andr...@heigl.org N 50°22'59.5" E 08°23'58" | | http://andreas.heigl.org http://hei.gl/wiFKy7 | +---------------------------------------------------------------------+ | http://hei.gl/root-ca | +---------------------------------------------------------------------+
signature.asc
Description: OpenPGP digital signature